Novel guide rna/cas endonuclease systems

ABSTRACT

Compositions and methods are provided for novel guide RNA/Cas endonuclease systems. Type II Cas9 endonuclease systems originating from Brevibacillus laterosporus, Lactobacillus reuteri MIc3, Lactobacillus rossiae DSM 15814, Pediococcus pentosaceus SL4, Lactobacillus nodensis JCM 14932, Sulfurospirillum sp. SCADC, Bifidobacterium thermophilum DSM 20210, Loktanella vestfoldensis, Sphingomonas sanxanigenens NX02, Epilithonimonas tenax DSM 16811, Sporocytophaga myxococcoides are described herein. The present disclosure also describes methods for genome modification of a target sequence in the genome of a cell, for gene editing, and for inserting a polynucleotide of interest into the genome of a cell.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is the 371 national stage entry of InternationalApplication Number PCT/US2016/032073, filed on 12 May 2016, which claimsthe benefit of U.S. Provisional Application No. 62/162,377, filed May15, 2015, U.S. Provisional Application No. 62/162,353, filed May 15,2015 and U.S. Provisional Application No. 62/196,535, filed Jul. 24,2015, each of which is incorporated herein in their entirety byreference.

FIELD

The disclosure relates to the field of plant molecular biology, inparticular, to compositions for novel guide RNA/Cas endonuclease systemsand compositions and methods for altering the genome of a cell.

REFERENCE TO SEQUENCE LISTING SUBMITTED ELECTRONICALLY

The official copy of the sequence listing is submitted electronicallyvia EFS-Web as an ASCII formatted sequence listing with a file named20160502_BB2539PCT_SequenceListing.txt created on May 2, 2016 and havinga size 236 kilobytes and is filed concurrently with the specification.The sequence listing contained in this ASCII formatted document is partof the specification and is herein incorporated by reference in itsentirety.

BACKGROUND

Recombinant DNA technology has made it possible to insert DNA sequencesat targeted genomic locations and/or modify (edit) specific endogenouschromosomal sequences, thus altering the organism's phenotype.Site-specific integration techniques, which employ site-specificrecombination systems, as well as other types of recombinationtechnologies, have been used to generate targeted insertions of genes ofinterest in a variety of organism. Genome-editing techniques such asdesigner zinc finger nucleases (ZFNs) or transcription activator-likeeffector nucleases (TALENs), or homing meganucleases, are available forproducing targeted genome perturbations, but these systems tends to havea low specificity and employ designed nucleases that need to beredesigned for each target site, which renders them costly andtime-consuming to prepare.

Although several approaches have been developed to target a specificsite for modification in the genome of an organism, there still remainsa need for new genome engineering technologies that are affordable, easyto set up, scalable, and amenable to targeting multiple positions withinthe genome of an organism

BRIEF SUMMARY

Compositions and methods are provided for rapid characterization ofnovel Cas endonuclease systems and the elements comprising such asystems, including, but not limiting to, rapid characterization of PAMsequences, guide RNA elements and CAS endonucleases.

In one embodiment of the disclosure, the guide RNA is a guide RNAscapable of forming a guide RNA/Cas endonuclease complex, wherein saidguide RNA/Cas endonuclease complex can recognize, bind to, andoptionally nick or cleave a target sequence, wherein said guide RNA is aduplex molecule comprising a chimeric non-naturally occurring crRNA anda tracrRNA, wherein said guide RNA/Cas endonuclease complex canrecognize, bind to, and optionally nick or cleave a target sequencewherein said chimeric non-naturally occurring crRNA comprises a variabletargeting domain capable of hybridizing to said target sequence, whereinsaid tracrRNA is originated from an organism selected from the groupconsisting of Brevibacillus laterosporus, Lactobacillus reuteri MIc3,Lactobacillus rossiae DSM 15814, Pediococcus pentosaceus SL4,Lactobacillus nodensis JCM 14932, Sulfurospirillum sp. SCADC,Bifidobacterium thermophilum DSM 20210, Loktanella vestfoldensis,Sphingomonas sanxanigenens NX02, Epilithonimonas tenax DSM 16811,Sporocytophaga myxococcoides and Psychroflexus torquis ATCC 700755.

In another embodiment of the disclosure, the guide RNA is a guide RNAcapable of forming a guide RNA/Cas endonuclease complex, wherein saidguide RNA/Cas endonuclease complex can recognize, bind to, andoptionally nick or cleave a target sequence, wherein said guide RNA is asingle molecule comprising a chimeric non-naturally occurring crRNAlinked to a tracrRNA originating from an organism selected from thegroup consisting of Brevibacillus laterosporus, Lactobacillus reuteriMIc3, Lactobacillus rossiae DSM 15814, Pediococcus pentosaceus SL4,Lactobacillus nodensis JCM 14932, Sulfurospirillum sp. SCADC,Bifidobacterium thermophilum DSM 20210, Loktanella vestfoldensis,Sphingomonas sanxanigenens NX02, Epilithonimonas tenax DSM 16811,Sporocytophaga myxococcoides and Psychroflexus torquis ATCC 700755,wherein said chimeric non-naturally occurring crRNA comprises a variabletargeting domain capable of hybridizing to said target sequence.

In another embodiment of the disclosure, the guide RNA is a guide RNAcapable of forming a guide RNA/Cas endonuclease complex, wherein saidguide RNA/Cas endonuclease complex can recognize, bind to, andoptionally nick or cleave a target sequence, wherein said guide RNA is aduplex molecule comprising a chimeric non-naturally occurring crRNA anda tracrRNA, wherein said chimeric non-naturally occurring crRNAcomprises at least a fragment of a crRNA originating from an organismselected from the group consisting of Brevibacillus laterosporus,Lactobacillus reuteri MIc3, Lactobacillus rossiae DSM 15814, Pediococcuspentosaceus SL4, Lactobacillus nodensis JCM 14932, Sulfurospirillum sp.SCADC, Bifidobacterium thermophilum DSM 20210, Loktanella vestfoldensis,Sphingomonas sanxanigenens NX02, Epilithonimonas tenax DSM 16811,Sporocytophaga myxococcoides and Psychroflexus torquis ATCC 700755,wherein said chimeric non-naturally occurring crRNA comprises a variabletargeting domain capable of hybridizing to said target sequence.

In another embodiment of the disclosure, the guide RNA is a guide RNAcapable of forming a guide RNA/Cas endonuclease complex, wherein saidguide RNA/Cas endonuclease complex can recognize, bind to, andoptionally nick or cleave a target sequence, wherein said guide RNA is asingle molecule comprising a tracrRNA linked to a chimeric non-naturallyoccurring crRNA comprising at least a fragment of a crRNA originatingfrom an organism selected from the group consisting of Brevibacilluslaterosporus, Lactobacillus reuteri MIc3, Lactobacillus rossiae DSM15814, Pediococcus pentosaceus SL4, Lactobacillus nodensis JCM 14932,Sulfurospirillum sp. SCADC, Bifidobacterium thermophilum DSM 20210,Loktanella vestfoldensis, Sphingomonas sanxanigenens NX02,Epilithonimonas tenax DSM 16811, Sporocytophaga myxococcoides andPsychroflexus torquis ATCC 700755, wherein said chimeric non-naturallyoccurring crRNA comprises a variable targeting domain capable ofhybridizing to said target sequence.

Also provided are nucleic acid constructs, plants, plant cells,explants, seeds and grain having an altered target site or alteredpolynucleotide of interest produced by the methods described herein.Additional embodiments of the methods and compositions of the presentdisclosure are shown herein.

BRIEF DESCRIPTION OF THE DRAWINGS AND THE SEQUENCE LISTING

The disclosure can be more fully understood from the following detaileddescription and the accompanying drawings and Sequence Listing, whichform a part of this application. The sequence descriptions and sequencelisting attached hereto comply with the rules governing nucleotide andamino acid sequence disclosures in patent applications as set forth in37 C.F.R. §§ 1.821-1.825. The sequence descriptions contain the threeletter codes for amino acids as defined in 37 C.F.R. §§ 1.821-1.825,which are incorporated herein by reference.

FIGURES

FIG. 1 shows a diagram of the formation of a full Oligoduplex IIcomprising a restriction enzyme recognition site (RE1), a targetsequence and a randomized Protospacer-Adjacent-Motif (PAM) sequence.

FIG. 2 show a diagram of the design and construction of a 5 nucleotide(5N) randomized Protospacer-Adjacent-Motif (PAM) plasmid library andhost cell library. RE1=restriction endonuclease 1, RE2=restrictionendonuclease 2.

FIG. 3 shows a diagram of the production of enriched PAM sided productsfor deep sequencing and identification of PAM preferences.

FIG. 4 depicts the PAM sequence distribution from a 5 nucleotide (5N)randomized Protospacer-Adjacent-Motif (PAM) plasmid library.

FIG. 5 shows the PAM preferences (NGGNG) for Streptococcus thermophilusCRISPR3 (Sth3) Cas9 endonuclease in both 50 nM and 100 nM digests.

FIG. 6 shows the PAM preferences (NGG) for Streptococcus pyogenes (Spy)Cas9 endonuclease in both 50 nM and 100 nM digests.

FIG. 7 shows the effect of decreasing Sth3 and Spy Cas9-crRNA-tracrRNAcomplex concentration and digestion time to determine the minimal Sth3and Spy Cas9 concentration and shortest digestion time where PCRamplified cleavage products may still be obtained from the randomizedPAM plasmid library.

FIG. 8 shows the PAM preferences (NGGNG) for Streptococcus thermophilusCRISPR3 (Sth3) Cas9 endonuclease positive controls in both 50 nM and 100nM digests.

FIG. 9 shows the PAM preferences (NGG) for Streptococcus pyogenes (Spy)Cas9 endonuclease positive controls in both 50 nM and 100 nM digests.

FIG. 10 shown the PAM preferences (NGGNG) observed in the minimallyStreptococcus thermophilus Sth3 digested libraries (0.5 nM-60 min and 50nM-1 min) compared to that exhibited by the respective 50 nM-60 minutepositive control.

FIG. 11 shown the PAM preferences (NGGNG) observed in the minimallyStreptococcus pyogenes Spy digested libraries (0.5 nM-60 min and 50 nM-1min) compared to that exhibited by the respective 50 nM-60 minutepositive control.

FIG. 12 shows the PAM preferences for Streptococcus pyogenes (Spy) Cas9endonuclease guided by a single guide RNA (sgRNA) or guided by acrRNA:tracrRNA duplex. The NGGNG PAM preference is nearly identicalregardless of the type of guide RNA used

FIG. 13 shows the PAM preferences (NGG) for Streptococcus pyogenes (Spy)Cas9 endonuclease guided by a single guide RNA (sgRNA) or guided by acrRNA:tracrRNA duplex. The NGG PAM preference is nearly identicalregardless of the type of guide RNA used

FIG. 14 shows the PAM preferences for Streptococcus thermophilus CRISPR3(Sth3) Cas9 endonuclease positive controls for comparing of a 5Nrandomized PAM plasmid DNA library and a 7N randomized PAM plasmid DNAlibrary.

FIG. 15 shows the PAM preferences (NGG) for Streptococcus pyogenes (Spy)Cas9 endonuclease positive controls for comparing of a 5N randomized PAMplasmid DNA library and a 7N randomized PAM plasmid DNA library.

FIG. 16 shows the PAM preferences (NNAGAAW) for Streptococcusthermophilus CRISPR1 (Sth1) Cas9 endonuclease in both 50 nM and 0.5 nMnM digests.

FIG. 17-A shows a genomic DNA region from, Brevibacillus laterosporusrepresenting the Type II CRISPR-Cas system described herein. FIG. 17-Blist 8 repeat sequences (SEQ ID NOs:37-44) of the genomic DNA regionfrom the Brevibacillus laterosporus.

FIG. 18 shows a diagram of the “direct” scenario and the “reverse”scenario of the tracrRNA and CRISPR array to determine a guide RNA forthe Cas9 protein identified from the Brevibacillus laterosporus (Blat).

FIG. 19 shows the secondary structure of the “direct” tracrRNA regiondownstream of the anti-repeat (SEQ ID NO: 68) from, Brevibacilluslaterosporus.

FIG. 20 shows the secondary structure of the “reverse” tracrRNA regiondownstream of the anti-repeat (SEQ ID NO: 69) from, Brevibacilluslaterosporus.

FIG. 21 shown an agarose gel with reaction products, indicating thatonly the “direct” sgRNA (dirsgRNA), but not the “reverse” sgRNA(revsgRNA) supported plasmid library cleavage in combination with a Cas9endonuclease originating from Brevibacillus laterosporus. (BlatCas9).

FIG. 22 shows the effect of decreasing BlatCas9 concentration anddigestion time to determine the minimal Blast Cas9 concentration andshortest digestion time where PCR amplified cleavage products may stillbe obtained from the randomized PAM plasmid library.

FIG. 23 shows the PAM preferences (NNNNCND) for Brevibacilluslaterosporus (Blat) Cas9 endonuclease in both 50 nM and 0.5 nM digests.

FIG. 24 depict sequencing results indicating that plasmid DNA cleavageoccurred in the protospacer 3 bp away from the PAM sequence.

FIG. 25 shows a genomic DNA region from Lactobacillus reuteri MIc3representing an example of a Type II CRISPR-Cas system described herein.

FIG. 26 shows a genomic DNA region from Lactobacillus rossiae DSM 15814representing an example of a Type II CRISPR-Cas system described herein.

FIG. 27 shows a genomic DNA region from Pediococcus pentosaceus SL4representing an example of a Type II CRISPR-Cas system described herein.

FIG. 28 shows a genomic DNA region from Lactobacillus nodensis JCM 14932representing an example of a Type II CRISPR-Cas system described herein.

FIG. 29 shows a genomic DNA region from Sulfurospirillum sp. SCADCrepresenting an example of a Type II CRISPR-Cas system described herein.

FIG. 30 shows a genomic DNA region from Bifidobacterium thermophilum DSM20210 representing an example of a Type II CRISPR-Cas system describedherein.

FIG. 31 shows a genomic DNA region from Loktanella vestfoldensisrepresenting an example of a Type II CRISPR-Cas system described herein.

FIG. 32 shows a genomic DNA region from Sphingomonas sanxanigenens NX02representing an example of a Type II CRISPR-Cas system described herein.

FIG. 33 shows a genomic DNA region from Epilithonimonas tenax DSM 16811representing an example of a Type II CRISPR-Cas system described herein.

FIG. 34 shows a genomic DNA region from Sporocytophaga myxococcoidesrepresenting an example of a Type II CRISPR-Cas system described herein.

FIG. 35 shows a genomic DNA region from Psychroflexus torquis ATCC700755 representing an example of a Type II CRISPR-Cas system describedherein.

FIG. 36 Bifidobacterium thermophilum (Bthe) Cas9 non-homologousend-joining (NHEJ) mutation frequencies with different single guide RNA(sgRNA) variable targeting domain (spacer) lengths (20 nt, 25 nt and 29nt) at 2 maize target sites. NHEJ mutations were detected by deepsequencing 2 days after transformation.

SEQUENCES

TABLE 1 Summary of Nucleic Acid and Protein SEQ ID Numbers Nucleic acidProtein Description SEQ ID NO. SEQ ID NO. Target sequence T1 1 (80bases) Single oligonucleotide GG-821N 2 (47 bases) OligonucleotideGG-820 3 (44 bases) TK-119 primer 4 (22 bases) pUC-dir primer 5 (22bases) JKYS800.1 forward primer 6 (59 bases) JKYS803 reverse primer 7(53 bases) Universal Forward primer 8 (43 bases) Universal Reverseprimer 9 (18 bases) Sth1-dir primer 10 (34 bases) Sth1-rev primer 11 (27bases) Sth3-dir primer 12 (26 bases) Sth3-rev primer 13 (30 bases)Spy-dir primer 14 (38 bases) Spy-rev primer 15 (32 bases) Streptococcusthermophilus (Sth3) crRNA 16 (42 bases) Streptococcus thermophilus(Sth3) tracrRNA 17 (78 bases) Streptococcus pyogenes (Spy) crRNA 18 (42bases) Streptococcus pyogenes (Spy) tracrRNA 19 (78 bases) TK-117 20 (31bases) TK-111 21 (30 bases) JKYS807.1 primer 22 (56 bases) JKYS807.2primer 23 (56 bases) JKYS807.3 primer 24 (56 bases) JKYS807.4 primer 25(56 bases) Sth3 sgRNA 26 (123 bases) Spy sgRNA 27 (105 bases) GG-940-Goligonucleotide 28 (59 bases) GG-940-C oligonucleotide 29 (59 bases)GG-940-A oligonucleotide 30 (59 bases) GG-940-T oligonucleotide 31 (59bases) JKYS812 32 (49 bases) Streptococcus thermophilus CRISPR1 (Sth1)crRNA 33 (42 bases) Streptococcus thermophilus CRISPR1 Sth1 tracrRNA 34(80 bases) Streptococcus thermophilus CRISPR3 (Sth3) Cas9 35 (1388 aa)Cas9 single long open-reading-frame from the 36 (3279 bases)Brevibacillus laterosporus bacterial strain SSP360D4 Repeat 1,Brevibacillus laterosporus SSP360D4 37 (36 bases) Repeat 2,Brevibacillus laterosporus SSP360D4 38 (36 bases) Repeat 3,Brevibacillus laterosporus SSP360D4 39 (36 bases) Repeat 4,Brevibacillus laterosporus SSP360D4 40 (36 bases) Repeat 5,Brevibacillus laterosporus SSP360D4 41 (36 bases) Repeat 6,Brevibacillus laterosporus SSP360D4 42 (36 bases) Repeat 7,Brevibacillus laterosporus SSP360D4 43 (36 bases) Repeat 8,Brevibacillus laterosporus SSP360D4 44 (36 bases) Blat-Cas9-dir 45 (29bases) Blat-Cas9-rev 46 35 bases) Blat sgRNA Direct 47 (177 bases) BlatsgRNA Reverse 48 (118 bases) GG-969 oligonucleotide 49 (68 bases) GG-839oligonucleotide 50 (62 bases) TK-149 51 55 bases) TK-150 52 (62 bases)GG-840 53 (71 bases) GG-841 54 (75 bases) TK-124 55 (37 bases) TK-151 56(26 bases) TK-126; 57 (32 bases) GG-935 58 (37 bases) GG-936 59 (45bases) pUC-EheD primer 60 (21 bases) pUC-LguR primer 61 (22 bases) SenseDNA Strand of Cleaved Sequencing 62 (21 bases) Template Anti-Sense DNAStrand Sequencing Read 63 (11 bases) Anti-Sense DNA Strand of CleavedSequencing 64 (21 bases) Template Sense DNA Strand of DNA SequencingRead 65 (11 bases) Sense DNA Strand of Target and PAM 66 (27 bases)Anti-Sense DNA Strand of Target and PAM 67 (27 bases) “Direct” tracrRNAregion downstream of the 68 (118 bases) anti-repeat “Reverse” tracrRNAregion downstream of the 69 (58 bases) anti-repeat Lactobacillus reuteriMlc3 (Lreu) Cas9 Open 70 (4107 bases) Reading Frame Lactobacillusrossiae DSM 15814 (Lros) Cas9 71 (4110 bases) Open Reading FramePediococcus pentosaceus SL4 (Ppen) Cas9 72 (4041 bases) Open ReadingFrame Lactobacillus nodensis JCM 14932 (Lnod) 73 (3393 bases) Cas9 OpenReading Frame Sulfurospirillum sp. SCADC (Sspe) Cas9 Open 74 (4086bases) Reading Frame Bifidobacterium thermophilum DSM 20210 75 (3444bases) (Bthe) Cas9 Open Reading Frame Loktanella vestfoldensis (Lves)Cas9 Open 76 (3216 bases) Reading Frame Sphingomonas sanxanigenens NX02(Ssan) 77 (3318 bases) Cas9 Open Reading Frame Epilithonimonas tenax DSM16811 (Eten) Cas9 78 (4200 bases) Open Reading Frame Sporocytophagamyxococcoides (Smyx) Cas9 79 (4362 bases) Open Reading FramePsychroflexus torquis ATCC 700755 (Ptor) 80 (4530 bases) Cas9 OpenReading Frame Lreu Cas9 Endonuclease 81 (1368 aa) Lros Cas9 Endonuclease82 (1369 aa) Ppen Cas9 Endonuclease 83 (1346 aa) Lnod Cas9 Endonuclease84 (1130 aa) Sspe Cas9 Endonuclease 85 (1361 aa) Bthe Cas9 Endonuclease86 (1147 aa) Lves Cas9 Endonuclease 87 (1071 aa) Ssan Cas9 Endonuclease88 (1105 aa) Eten Cas9 Endonuclease 89 (1399 aa) Smyx Cas9 Endonuclease90 (1453 aa) Ptor Cas9 Endonuclease 91 (1509 aa) Lreu CRISPR RepeatConsensus 92 (36 bases) Lros CRISPR Repeat Consensus 93 (36 bases) PpenCRISPR Repeat Consensus 94 (36 bases) Lnod CRISPR Repeat Consensus 95(36 bases) Sspe CRISPR Repeat Consensus 96 (36 bases) Bthe CRISPR RepeatConsensus 97 (36 bases) Lves CRISPR Repeat Consensus 98 (36 bases) SsanCRISPR Repeat Consensus 99 (36 bases) Eten CRISPR Repeat Consensus 100(47 bases) Smyx CRISPR Repeat Consensus 101 (47 bases) Ptor CRISPRRepeat Consensus 102 46 bases) Lreu Anti-Repeat 103 (36 bases) LrosAnti-Repeat 104 (37 bases) Ppen Anti-Repeat 105 (37 bases) LnodAnti-Repeat 106 (38 bases) Sspe Anti-Repeat 107 (39 bases) BtheAnti-Repeat 108 (36 bases) Lves Anti-Repeat 109 (36 bases) SsanAnti-Repeat 110 (36 bases) Eten Anti-Repeat 111 (47 bases) SmyxAnti-Repeat 112 (47 bases) Ptor Anti-Repeat 113 (46 bases) Lreu Singleguide RNA 114 (169 bases) Lros Single guide RNA 115 (166 bases) PpenSingle guide RNA 116 (168 bases) Lnod Single guide RNA 117 (114 bases)Sspe Single guide RNA 118 (180 bases) Sspe Single guide RNA 119 (117bases) Bthe Single guide RNA 120 (254 bases) Lves Single guide RNA 121(200 bases) Ssan Single guide RNA 122 (195 bases) Eten Single guide RNA123 (155 bases) Smyx Single guide RNA 124 (149 bases) Ptor Single guideRNA 125 (155 bases) GG-939 126 (57 bases) Single guide RNA 127 (174bases) Lreu Single guide RNA 128 (166 bases) Lros Single guide RNA 129(163 bases) Ppen Single guide RNA 130 (165 bases) Lnod Single guide RNA131 (111 bases) Sspe Single guide RNA 132 (177 bases) Sspe Single guideRNA 133 (114 bases) Bthe Single guide RNA 134 (251 bases) Lves Singleguide RNA 135 (197 bases) Ssan Single guide RNA 136 (192 bases) EtenSingle guide RNA 137 (152 bases) Smyx Single guide RNA 138 (146 bases)Ptor Single guide RNA 139 (152 bases) Cas9 endonuclease Brevibacilluslaterosporus 140 (1092 aa) bacterial strain SSP360D4 Variable Targetingdomain-direct 141 Variable Targeting domain-reverse 142 16 nt loop ofthe repeat-direct 143 16 nt loop of the repeat-reverse 144 anti-repeatregion-direct 145 anti-repeat region-reverse 146 Putative 3′ tracrRNASequence - direct 147 Putative 3′ tracrRNA Sequence - reverse 148Lactobacillus reuteri Mlc3 (Lreu) crRNA repeat 149 region Lactobacillusrossiae DSM 15814 (Lros) crRNA 150 repeat region Pediococcus pentosaceusSL4 (Ppen) crRNA 151 repeat region Lactobacillus nodensis JCM 14932(Lnod) 152 crRNA repeat region Sulfurospirillum sp. SCADC (Sspe) crRNA153-154 repeat region Bifidobacterium thermophilum DSM 20210 155 (Bthe)crRNA repeat region Loktanella vestfoldensis (Lves) crRNA repeat 156region Sphingomonas sanxanigenens NX02 (Ssan) 157 crRNA repeat regionEpilithonimonas tenax DSM 16811 (Eten) 158 crRNA repeat regionSporocytophaga myxococcoides (Smyx) crRNA 159 repeat regionPsychroflexus torquis ATCC 700755 (Ptor) 160 crRNA repeat regionLactobacillus reuteri Mlc3 (Lreu) tracrRNA anti- 161 repeatLactobacillus rossiae DSM 15814 (Lros) 162 tracrRNA anti-repeatPediococcus pentosaceus SL4 (Ppen) 163 tracrRNA anti-repeatLactobacillus nodensis JCM 14932 (Lnod) 164 tracrRNA anti-repeatSulfurospirillum sp. SCADC (Sspe) tracrRNA 165-166 anti-repeatBifidobacterium thermophilum DSM 20210 167 (Bthe) tracrRNA anti-repeatLoktanella vestfoldensis (Lves) tracrRNA anti- 168 repeat Sphingomonassanxanigenens NX02 (Ssan) 169 tracrRNA anti-repeat Epilithonimonas tenaxDSM 16811 (Eten) 170 tracrRNA anti-repeat Sporocytophaga myxococcoides(Smyx) 171 tracrRNA anti-repeat Psychroflexus torquis ATCC 700755 (Ptor)172 tracrRNA anti-repeat Lactobacillus reuteri Mlc3 (Lreu) 3′ tracrRNA173 Lactobacillus rossiae DSM 15814 (Lros) 3′ 174 tracrRNA Pediococcuspentosaceus SL4 (Ppen) 3′ 175 tracrRNA Lactobacillus nodensis JCM 14932(Lnod) 3′ 176 tracrRNA Sulfurospirillum sp. SCADC (Sspe) 3′ tracrRNA177-178 Bifidobacterium thermophilum DSM 20210 179 (Bthe) 3′ tracrRNALoktanella vestfoldensis (Lves) 3′ tracrRNA 180 Sphingomonassanxanigenens NX02 (Ssan) 3′ 181 tracrRNA Epilithonimonas tenax DSM16811 (Eten) 3′ 182 tracrRNA Sporocytophaga myxococcoides (Smyx) 3′ 183tracrRNA Psychroflexus torquis ATCC 700755 (Ptor) 3′ 184 tracrRNA

DETAILED DESCRIPTION

Compositions and methods are provided for rapid characterization of Casendonuclease systems and the elements comprising such a systems,including, but not limiting to, rapid characterization of PAM sequences,guide RNA elements and Cas endonucleases. Cas9 endonuclease systemsoriginating from Brevibacillus laterosporus, Lactobacillus reuteri MIc3,Lactobacillus rossiae DSM 15814, Pediococcus pentosaceus SL4,Lactobacillus nodensis JCM 14932, Sulfurospirillum sp. SCADC,Bifidobacterium thermophilum DSM 20210, Loktanella vestfoldensis,Sphingomonas sanxanigenens NX02, Epilithonimonas tenax DSM 16811,Sporocytophaga myxococcoides are described herein.

The present disclosure also describes methods for genome modification ofa target sequence in the genome of a cell, for gene editing, and forinserting a polynucleotide of interest into the genome of a cell.

CRISPR (clustered regularly interspaced short palindromic repeats) locirefers to certain genetic loci encoding factors of DNA cleavage systems,for example, used by bacterial and archaeal cells to destroy foreign DNA(Horvath and Barrangou, 2010, Science 327:167-170). A CRISPR locus canconsist of a CRISPR array, comprising short direct repeats separated byshort variable DNA sequences (called ‘spacers’), which can be flanked bydiverse Cas (CRISPR-associated) genes. Multiple CRISPR-Cas systems havebeen described including Class 1 systems, with multisubunit effectorcomplexes, and Class 2 systems, with single protein effectors (such asbut not limiting to Cas9, Cpf1, C2c1, C2c2, C2c3). (Zetsche et al.,2015, Cell 163, 1-13; Shmakov et al., 2015, Molecular Cell 60, 1-13;Makarova et al. 2015, Nature Reviews Microbiology Vol. 13:1-15; WO2013/176772 A1 published on Nov. 23, 2013 and incorporated by itsentirety by reference herein).

The type II CRISPR/Cas system from bacteria employs a crRNA (CRISPR RNA)and tracrRNA (trans-activating CRISPR RNA) to guide a Cas9 endonuclease(encoded by a cas9 gene) to its DNA target. The crRNA contains a spacerregion complementary to one strand of the double strand DNA target and aregion that base pairs with the tracrRNA (trans-activating CRISPR RNA)forming a RNA duplex that directs the Cas9 endonuclease to cleave theDNA target. Spacers are acquired through a not fully understood processinvolving Cas1 and Cas2 proteins. All type II CRISPR-Cas loci containcas1 and cas2 genes in addition to the cas9 gene (Makarova et al. 2015,Nature Reviews Microbiology Vol. 13:1-15). Type II CRISR-Cas loci canencode a tracrRNA, which is partially complementary to the repeatswithin the respective CRISPR array, and can comprise other proteins suchas Csn1 and Csn2. The presence of cas9 in the vicinity of cas1 and cas2genes is the hallmark of type II loci (Makarova et al. 2015, NatureReviews Microbiology Vol. 13:1-15).

The number of CRISPR-associated genes at a given CRISPR locus can varybetween species (Haft et al., 2005, Computational Biology, PLoS ComputBiol 1(6): e60. doi:10.1371/journal.pcbi.0010060; Makarova et al. 2015,Nature Reviews Microbiology Vol. 13:1-15; WO 2013/176772 A1 published onNov. 23, 2013 and incorporated by its entirety by reference herein).

The term “Cas gene” herein refers to a gene that is generally coupled,associated or close to, or in the vicinity of flanking CRISPR loci. Theterms “Cas gene”, “CRISPR-associated (Cas) gene” are usedinterchangeably herein.

The term “Cas endonuclease” herein refers to a protein encoded by a Casgene. A Cas endonuclease herein, when in complex with a suitablepolynucleotide component, is capable of recognizing, binding to, andoptionally nicking or cleaving all or part of a specific DNA targetsequence. A Cas endonuclease described herein comprises one or morenuclease domains. Cas endonucleases of the disclosure includes thosehaving a HNH or HNH-like nuclease domain and/or a RuvC or RuvC-likenuclease domain. A Cas endonuclease of the disclosure includes a Cas9protein, a Cpf1 protein, a C2c1 protein, a C2c2 protein, a C2c3 protein,Cas3, Cas 5, Cas7, Cas8, Cas10, or complexes of these.

As used herein, the terms “guide polynucleotide/Cas endonucleasecomplex”, “guide polynucleotide/Cas endonuclease system”, “guidepolynucleotide/Cas complex”, “guide polynucleotide/Cas system” are usedinterchangeably herein and refer to at least one guide polynucleotideand at least one Cas endonuclease that are capable of forming a complex,wherein said guide polynucleotide/Cas endonuclease complex can directthe Cas endonuclease to a DNA target site, enabling the Cas endonucleaseto recognize, bind to, and optionally nick or cleave (introduce a singleor double strand break) into the DNA target site. A guidepolynucleotide/Cas endonuclease complex herein can comprise Casprotein(s) and suitable polynucleotide component(s) of any of the fourknown CRISPR systems (Horvath and Barrangou, Science 327:167-170) suchas a type I, II, or III CRISPR system. A Cas endonuclease unwinds theDNA duplex at the target sequence and optionally cleaves at least oneDNA strand, as mediated by recognition of the target sequence by apolynucleotide (such as, but not limited to, a crRNA or guide RNA) thatis in complex with the Cas protein. Such recognition and cutting of atarget sequence by a Cas endonuclease typically occurs if the correctprotospacer-adjacent motif (PAM) is located at or adjacent to the 3′ endof the DNA target sequence. Alternatively, a Cas protein herein may lackDNA cleavage or nicking activity, but can still specifically bind to aDNA target sequence when complexed with a suitable RNA component. (Seealso U.S. Patent Application US 2015-0082478 A1, published on Mar. 19,2015 and US 2015-0059010 A1, published on Feb. 26, 2015, both are herebyincorporated in its entirety by reference).

A guide polynucleotide/Cas endonuclease complex can cleave one or bothstrands of a DNA target sequence. A guide polynucleotide/Casendonuclease complex that can cleave both strands of a DNA targetsequence typically comprises a Cas protein that has all of itsendonuclease domains in a functional state (e.g., wild type endonucleasedomains or variants thereof retaining some or all activity in eachendonuclease domain). Thus, a wild type Cas protein (e.g., a Cas9protein disclosed herein), or a variant thereof retaining some or allactivity in each endonuclease domain of the Cas protein, is a suitableexample of a Cas endonuclease that can cleave both strands of a DNAtarget sequence. A Cas9 protein comprising functional RuvC and HNHnuclease domains is an example of a Cas protein that can cleave bothstrands of a DNA target sequence. A guide polynucleotide/Casendonuclease complex that can cleave one strand of a DNA target sequencecan be characterized herein as having nickase activity (e.g., partialcleaving capability). A Cas nickase typically comprises one functionalendonuclease domain that allows the Cas to cleave only one strand (i.e.,make a nick) of a DNA target sequence. For example, a Cas9 nickase maycomprise (i) a mutant, dysfunctional RuvC domain and (ii) a functionalHNH domain (e.g., wild type HNH domain). As another example, a Cas9nickase may comprise (i) a functional RuvC domain (e.g., wild type RuvCdomain) and (ii) a mutant, dysfunctional HNH domain. Non-limitingexamples of Cas9 nickases suitable for use herein are disclosed byGasiunas et al. (Proc. Natl. Acad. Sci. U.S.A. 109:E2579-E2586), Jineket al. (Science 337:816-821), Sapranauskas et al. (Nucleic Acids Res.39:9275-9282) and in U.S. Patent Appl. Publ. No. 2014/0189896, which areincorporated herein by reference.

A pair of Cas9 nickases can be used to increase the specificity of DNAtargeting. In general, this can be done by providing two Cas9 nickasesthat, by virtue of being associated with RNA components with differentguide sequences, target and nick nearby DNA sequences on oppositestrands in the region for desired targeting. Such nearby cleavage ofeach DNA strand creates a double strand break (i.e., a DSB withsingle-stranded overhangs), which is then recognized as a substrate fornon-homologous-end-joining, NHEJ (leading to indel formation) orhomologous recombination, HR. Each nick in these embodiments can be atleast about 5, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, or 100 (or anyinteger between 5 and 100) bases apart from each other, for example. Oneor two Cas9 nickase proteins herein can be used in a Cas9 nickase pair.For example, a Cas9 nickase with a mutant RuvC domain, but functioningHNH domain (i.e., Cas9 HNH+/RuvC−), could be used (e.g., Streptococcuspyogenes Cas9 HNH+/RuvC−). Each Cas9 nickase (e.g., Cas9 HNH+/RuvC−)would be directed to specific DNA sites nearby each other (up to 100base pairs apart) by using suitable RNA components herein with guide RNAsequences targeting each nickase to each specific DNA site.

A Cas protein can be part of a fusion protein comprising one or moreheterologous protein domains (e.g., 1, 2, 3, or more domains in additionto the Cas protein). Such a fusion protein may comprise any additionalprotein sequence, and optionally a linker sequence between any twodomains, such as between Cas and a first heterologous domain. Examplesof protein domains that may be fused to a Cas protein herein include,without limitation, epitope tags (e.g., histidine [His], V5, FLAG,influenza hemagglutinin [HA], myc, VSV-G, thioredoxin [Trx]), reporters(e.g., glutathione-5-transferase [GST], horseradish peroxidase [HRP],chloramphenicol acetyltransferase [CAT], beta-galactosidase,beta-glucuronidase [GUS], luciferase, green fluorescent protein [GFP],HcRed, DsRed, cyan fluorescent protein [CFP], yellow fluorescent protein[YFP], blue fluorescent protein [BFP]), and domains having one or moreof the following activities: methylase activity, demethylase activity,transcription activation activity (e.g., VP16 or VP64), transcriptionrepression activity, transcription release factor activity, histonemodification activity, RNA cleavage activity and nucleic acid bindingactivity. A Cas protein can also be in fusion with a protein that bindsDNA molecules or other molecules, such as maltose binding protein (MBP),S-tag, Lex A DNA binding domain (DBD), GAL4A DNA binding domain, andherpes simplex virus (HSV) VP16.

A guide polynucleotide/Cas endonuclease complex in certain embodimentscan bind to a DNA target site sequence, but does not cleave any strandat the target site sequence. Such a complex may comprise a Cas proteinin which all of its nuclease domains are mutant, dysfunctional. Forexample, a Cas9 protein herein that can bind to a DNA target sitesequence, but does not cleave any strand at the target site sequence,may comprise both a mutant, dysfunctional RuvC domain and a mutant,dysfunctional HNH domain. A Cas protein herein that binds, but does notcleave, a target DNA sequence can be used to modulate gene expression,for example, in which case the Cas protein could be fused with atranscription factor (or portion thereof) (e.g., a repressor oractivator, such as any of those disclosed herein).

In one embodiment, the Cas endonuclease gene is a Type II Cas9endonuclease, such as but not limited to, Cas9 genes listed in SEQ IDNOs: 462, 474, 489, 494, 499, 505, and 518 of WO2007/025097 publishedMar. 1, 2007, and incorporated herein by reference. In anotherembodiment, the Cas endonuclease gene is a plant, maize or soybeanoptimized Cas9 endonuclease gene. The Cas endonuclease gene can beoperably linked to a SV40 nuclear targeting signal upstream of the Cascodon region and a bipartite VirD2 nuclear localization signal (Tinlandet al. (1992) Proc. Natl. Acad. Sci. USA 89:7442-6) downstream of theCas codon region.

“Cas9” (formerly referred to as Cas5, Csn1, or Csx12) herein refers to aCas endonuclease of a type II CRISPR system that forms a complex with acrNucleotide and a tracrNucleotide, or with a single guidepolynucleotide, for specifically recognizing and cleaving all or part ofa DNA target sequence. Cas9 protein comprises a RuvC nuclease domain andan HNH (H—N—H) nuclease domain, each of which can cleave a single DNAstrand at a target sequence (the concerted action of both domains leadsto DNA double-strand cleavage, whereas activity of one domain leads to anick). In general, the RuvC domain comprises subdomains I, II and III,where domain I is located near the N-terminus of Cas9 and subdomains IIand III are located in the middle of the protein, flanking the HNHdomain (Hsu et al, Cell 157:1262-1278).

Cas9 endonculeases are typically derived from a type II CRISPR system,which includes a DNA cleavage system utilizing a Cas9 endonuclease incomplex with at least one polynucleotide component. For example, a Cas9can be in complex with a CRISPR RNA (crRNA) and a trans-activatingCRISPR RNA (tracrRNA). In another example, a Cas9 can be in complex witha single guide RNA

In one embodiment of the disclosure, the composition comprises at leastone Cas9 endonuclease selected from the group consisting of SEQ ID NOs:81-91, or a functional fragment thereof.

In one embodiment of the disclosure, the composition comprises at leastone recombinant DNA vector encoding the Cas9 endonuclease selected fromthe group consisting of SEQ ID NOs: 81-91 (such as the DNA sequencesform SEQ ID NO: 70-80), or mRNA encoding Cas9 endonuclease selected fromthe group consisting of SEQ ID NOs: 81-91. The Cas9 endonucleaseselected from the group consisting of SEQ ID NOs: 81-91 can form a(Ribonucleotide Protein—RNP) complex with at least one guide RNA,wherein said complex is capable of recognizing, binding to, andoptionally nicking or cleaving all or part of a target site.

Recombinant DNA expressing the Cas9 endonucleases described herein(including functional fragments thereof, plant or microbe codonoptimized Cas9 endonuclease) can be stably integrated into the genome ofan organism. For example, plants can be produced that comprise a cas9gene stably integrated in the plant's genome. Plants expressing a stablyintegrated Cas endonuclease can be exposed to at least one guide RNAand/or a polynucleotide modification templates and/or donor DNAs toenable genome modifications such as gene knockout, gene editing or DNAinsertions.

A variant of a Cas9 protein sequence may be used, but should havespecific binding activity, and optionally endonucleolytic activity,toward DNA when associated with an RNA component herein. Such a variantmay comprise an amino acid sequence that is at least about 80%, 81%,82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%,96%, 97%, 98%, or 99% A identical to the amino acid sequence of thereference Cas9. Alternatively, a Cas9 protein may comprise an amino acidsequence that is at least about 80%^(, 81)%, 82%, 83%, 84%, 85%, 86%,87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to any of the foregoing amino acid sequences, for example.Such a variant Cas9 protein should have specific binding activity, andoptionally cleavage or nicking activity, toward DNA when associated withan RNA component herein.

The Cas endonuclease can comprise a modified form of the Cas9polypeptide. The modified form of the Cas9 polypeptide can include anamino acid change (e.g., deletion, insertion, or substitution) thatreduces the naturally-occurring nuclease activity of the Cas9 protein.For example, in some instances, the modified form of the Cas9 proteinhas less than 50%, less than 40%, less than 30%, less than 20%, lessthan 10%, less than 5%, or less than 1% of the nuclease activity of thecorresponding wild-type Cas9 polypeptide (US patent applicationUS20140068797 A1 published on Mar. 6, 2014). In some cases, the modifiedform of the Cas9 polypeptide has no substantial nuclease activity and isreferred to as catalytically “inactivated Cas9” or “deactivated cas9(dCas9).” Catalytically inactivated Cas9 variants include Cas9 variantsthat contain mutations in the HNH and RuvC nuclease domains. Thesecatalytically inactivated Cas9 variants are capable of interacting withsgRNA and binding to the target site in vivo but cannot cleave eitherstrand of the target DNA.

A catalytically inactive Cas9 can be fused to a heterologous sequence(US patent application US20140068797 A1 published on Mar. 6, 2014).Suitable fusion partners include, but are not limited to, a polypeptidethat provides an activity that indirectly increases transcription byacting directly on the target DNA or on a polypeptide (e.g., a histoneor other DNA-binding protein) associated with the target DNA. Additionalsuitable fusion partners include, but are not limited to, a polypeptidethat provides for methyltransferase activity, demethylase activity,acetyltransferase activity, deacetylase activity, kinase activity,phosphatase activity, ubiquitin ligase activity, deubiquitinatingactivity, adenylation activity, deadenylation activity, SUMOylatingactivity, deSUMOylating activity, ribosylation activity, deribosylationactivity, myristoylation activity, or demyristoylation activity. Furthersuitable fusion partners include, but are not limited to, a polypeptidethat directly provides for increased transcription of the target nucleicacid (e.g., a transcription activator or a fragment thereof, a proteinor fragment thereof that recruits a transcription activator, a smallmolecule/drug-responsive transcription regulator, etc.). A catalyticallyinactive Cas9 can also be fused to a Fokl nuclease to generate doublestrand breaks (Guilinger et al. Nature biotechnology, volume 32, number6, June 2014).

A Cas protein herein such as a Cas9 endonuclease protein can comprise aheterologous nuclear localization sequence (NLS). A heterologous NLSamino acid sequence herein may be of sufficient strength to driveaccumulation of a Cas protein in a detectable amount in the nucleus of ayeast cell herein, for example. An NLS may comprise one (monopartite) ormore (e.g., bipartite) short sequences (e.g., 2 to 20 residues) ofbasic, positively charged residues (e.g., lysine and/or arginine), andcan be located anywhere in a Cas amino acid sequence but such that it isexposed on the protein surface. An NLS may be operably linked to theN-terminus or C-terminus of a Cas protein herein, for example. Two ormore NLS sequences can be linked to a Cas protein, for example, such ason both the N- and C-termini of a Cas protein. The Cas endonuclease genecan be operably linked to a SV40 nuclear targeting signal upstream ofthe Cas codon region and a bipartite VirD2 nuclear localization signal(Tinland et al. (1992) Proc. Natl. Acad. Sci. USA 89:7442-6) downstreamof the Cas codon region. Non-limiting examples of suitable NLS sequencesherein include those disclosed in U.S. Pat. No. 7,309,576, which isincorporated herein by reference.

The terms “functional fragment”, “fragment that is functionallyequivalent” and “functionally equivalent fragment” of a Cas endonucleaseare used interchangeably herein, and refer to a portion or subsequenceof the Cas endonuclease sequence of the present disclosure in which theability to recognize, bind to, and optionally nick or cleave (introducea single or double strand break in) the target site is retained.

The terms “functional variant”, “Variant that is functionallyequivalent” and “functionally equivalent variant” of a Cas endonucleaseare used interchangeably herein, and refer to a variant of the Casendonuclease of the present disclosure in which the ability torecognize, bind to, and optionally nick or cleave (introduce a single ordouble strand break in) the target site is retained. Fragments andvariants can be obtained via methods such as site-directed mutagenesisand synthetic construction.

In one embodiment, the Cas endonuclease gene is a plant codon optimizedStreptococcus pyogenes Cas9 gene that can recognize any genomic sequenceof the form N(12-30)NGG can in principle be targeted.

In one embodiment, the Cas endonuclease is a Cas9 endonucleaseoriginated from organism selected from the group consisting ofBrevibacillus laterosporus, Lactobacillus reuteri MIc3, Lactobacillusrossiae DSM 15814, Pediococcus pentosaceus SL4, Lactobacillus nodensisJCM 14932, Sulfurospirillum sp. SCADC, Bifidobacterium thermophilum DSM20210, Loktanella vestfoldensis, Sphingomonas sanxanigenens NX02,Epilithonimonas tenax DSM 16811, Sporocytophaga myxococcoides andPsychroflexus torquis ATCC 700755, wherein said Cas9 endonuclease canform a guide RNA/Cas endonuclease complex capable of recognizing,binding to, and optionally nicking or cleaving all or part of a DNAtarget sequence.

The Cas endonuclease can be introduced directly into a cell by anymethod known in the art, for example, but not limited to transientintroduction methods, transfection and/or topical application.

The guide polynucleotides and guide polynucleotide/Cas endonucleasesystems described herein include guide polynucleotides comprising acrRNA (comprising a variable targeting (VT) domain linked to tracr-matesequence that can hybridized to the tracr nucleotide) wherein said guidepolynucleotide directs sequence-specific binding of the guidepolynucleotide/Cas endonuclease complex to a target sequence in aeukaryotic cell. In an aspect, the guide polynucleotide targets a targetsequence in a non-human eukaryotic organism preferably a multicellulareukaryotic organism, comprising a eukaryotic host cell. In one aspect,the guide polynucleotide is a non-naturally occurring guidepolynucleotide or a guide polynucleotide targeting a target sequencethat is not natural to bacteria. The disclosed guide polynucleotides canbe reprogrammed to target nucleotide sequences in non-bacterial cellssuch as, but not limiting to changing the VT domain to targetnon-bacterial target sequences and sequences not naturally acquired bythe system from which the crRNA was obtained. Alternatively, the VTdomain can be programmed to guide the crRNA to a target sequence in aeukaryotic genome. Any sequence in a eukaryotic genome can be targetedusing the disclosed guide polynucleotides, such as, mammalian (e.g.human, mouse, etc.), yeast, insect, animal, and plant sequences. Inother embodiments, the VT domain can be programmed to guide the crRNA toa target sequence in a prokaryotic genome or bacterial plasmid sequencethat is not naturally targeted by the native system.

In some embodiments, the guide polynucleotide/Cas endonuclease complexcomprises one or more nuclear localization sequences of sufficientstrength to drive accumulation of said complex in a detectable amount inthe nucleus of a eukaryotic cell. For example, nuclear localizationsignals can be added to the N- or C- or both the N- and C-terminus ofthe Cas protein. In other embodiments, one or more cellular localizationsignals can be included in the complex to provide for accumulation ofthe complex in a detectable amount in cellular organelles in which adesired target sequence is contained. For example, chloroplast targetingsequences can be added to the Cas protein to provide accumulation in achloroplast organelle in a plant cell where the desired target sequenceis found in the plant chloroplast genome.

The guide polynucleotide/Cas endonuclease system described herein can beprovided to eukaryotic cells and reprogrammed to facilitate cleavage ofendogenous eukaryotic target polynucleotides.

Endonucleases are enzymes that cleave the phosphodiester bond within apolynucleotide chain, and include restriction endonucleases that cleaveDNA at specific sites without damaging the bases. Restrictionendonucleases include Type I, Type II, Type III, and Type IVendonucleases, which further include subtypes. In the Type I and TypeIII systems, both the methylase and restriction activities are containedin a single complex. Endonucleases also include meganucleases, alsoknown as homing endonucleases (HEases), which like restrictionendonucleases, bind and cut at a specific recognition site, however therecognition sites for meganucleases are typically longer, about 18 bp ormore (patent application WO-PCT PCT/US12/30061 filed on Mar. 22, 2012).Meganucleases have been classified into four families based on conservedsequence motifs, the families are the LAGLIDADG, GIY-YIG, H-N-H, andHis-Cys box families. These motifs participate in the coordination ofmetal ions and hydrolysis of phosphodiester bonds. HEases are notablefor their long recognition sites, and for tolerating some sequencepolymorphisms in their DNA substrates. The naming convention formeganuclease is similar to the convention for other restrictionendonuclease. Meganucleases are also characterized by prefix F-, I-, orPI- for enzymes encoded by free-standing ORFs, introns, and inteins,respectively. One step in the recombination process involvespolynucleotide cleavage at or near the recognition site. This cleavingactivity can be used to produce a double-strand break. For reviews ofsite-specific recombinases and their recognition sites, see, Sauer(1994) Curr Op Biotechnol 5:521-7; and Sadowski (1993) FASEB 7:760-7. Insome examples the recombinase is from the Integrase or Resolvasefamilies.

TAL effector nucleases are a new class of sequence-specific nucleasesthat can be used to make double-strand breaks at specific targetsequences in the genome of a plant or other organism. (Miller et al.(2011) Nature Biotechnology 29:143-148). Zinc finger nucleases (ZFNs)are engineered double-strand break inducing agents comprised of a zincfinger DNA binding domain and a double-strand-break-inducing agentdomain. Recognition site specificity is conferred by the zinc fingerdomain, which typically comprising two, three, or four zinc fingers, forexample having a C2H2 structure, however other zinc finger structuresare known and have been engineered. Zinc finger domains are amenable fordesigning polypeptides which specifically bind a selected polynucleotiderecognition sequence. ZFNs include an engineered DNA-binding zinc fingerdomain linked to a non-specific endonuclease domain, for examplenuclease domain from a Type IIs endonuclease such as Fokl. Additionalfunctionalities can be fused to the zinc-finger binding domain,including transcriptional activator domains, transcription repressordomains, and methylases. In some examples, dimerization of nucleasedomain is required for cleavage activity. Each zinc finger recognizesthree consecutive base pairs in the target DNA. For example, a 3 fingerdomain recognized a sequence of 9 contiguous nucleotides, with adimerization requirement of the nuclease, two sets of zinc fingertriplets are used to bind an 18 nucleotide recognition sequence.

As used herein, the term “guide polynucleotide”, relates to apolynucleotide sequence that can form a complex with a Cas endonucleaseand enables the Cas endonuclease to recognize, bind to, and optionallycleave a DNA target site. The guide polynucleotide can be a singlemolecule or a double molecule. The guide polynucleotide sequence can bea RNA sequence, a DNA sequence, or a combination thereof (a RNA-DNAcombination sequence). Optionally, the guide polynucleotide can compriseat least one nucleotide, phosphodiester bond or linkage modificationsuch as, but not limited, to Locked Nucleic Acid (LNA), 5-methyl dC,2,6-Diaminopurine, 2′-Fluoro A, 2′-Fluoro U, 2′-O-Methyl RNA,phosphorothioate bond, linkage to a cholesterol molecule, linkage to apolyethylene glycol molecule, linkage to a spacer 18 (hexaethyleneglycol chain) molecule, or 5′ to 3′ covalent linkage resulting incircularization. A guide polynucleotide that solely comprisesribonucleic acids is also referred to as a “guide RNA” or “gRNA” (Seealso U.S. Patent Application US 2015-0082478 A1, published on Mar. 19,2015 and US 2015-0059010 A1, published on Feb. 26, 2015, both are herebyincorporated in its entirety by reference).

In one embodiment of the disclosure, the guide polynucleotide is asingle guide RNA capable of forming a guide RNA/Cas9 endonucleasecomplex, wherein said guide RNA/Cas9 endonuclease complex can recognize,bind to, and optionally nick or cleave a target sequence, wherein saidsingle guide RNA is selected from the group consisting of SEQ ID NOs:128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138 and 139.

In one embodiment of the disclosure, the guide polynucleotide is asingle guide RNA capable of forming a guide RNA/Cas9 endonucleasecomplex, wherein said guide RNA/Cas9 endonuclease complex can recognize,bind to, and optionally nick or cleave a target sequence, wherein saidsingle guide RNA comprises a chimeric non-naturally occurring crRNAlinked to a tracrRNA, wherein said tracrRNA comprises a nucleotidesequence selected from the group consisting of SEQ ID NOs: 173, 174,175, 176, 177, 178, 179, 180, 181, 182, 183 and 184, wherein saidchimeric non-naturally occurring crRNA comprises a nucleotide sequenceselected from the group consisting of SEQ ID NOs: 149, 150, 151, 152,153, 154, 155, 156, 157, 158, 159 and 160.

In one embodiment of the disclosure, the guide polynucleotide is a guideRNA capable of forming a guide RNA/Cas9 endonuclease complex, whereinsaid guide RNA/Cas9 endonuclease complex can recognize, bind to, andoptionally nick or cleave a target sequence, wherein said guide RNA is aduplex molecule comprising a chimeric non-naturally occurring crRNA anda tracrRNA, wherein said tracrRNA comprises a nucleotide sequenceselected from the group consisting of SEQ ID NOs: 173, 174, 175, 176,177, 178, 179, 180, 181, 182, 183 and 184, wherein said chimericnon-naturally occurring crRNA comprises a nucleotide sequence selectedfrom the group consisting of SEQ ID NOs: 149, 150, 151, 152, 153, 154,155, 156, 157, 158, 159 and 160, wherein said chimeric non-naturallyoccurring crRNA comprises a variable targeting domain capable ofhybridizing to said target sequence.

The guide polynucleotide can be a double molecule (also referred to asduplex guide polynucleotide) comprising a crNucleotide sequence and atracrNucleotide sequence. The crNucleotide includes a first nucleotidesequence domain (referred to as Variable Targeting domain or VT domain)that can hybridize to a nucleotide sequence in a target DNA and a secondnucleotide sequence (also referred to as a tracr mate sequence) that ispart of a Cas endonuclease recognition (CER) domain. The tracr matesequence can hybridized to a tracrNucleotide along a region ofcomplementarity and together form the Cas endonuclease recognitiondomain or CER domain. The CER domain is capable of interacting with aCas endonuclease polypeptide. The crNucleotide and the tracrNucleotideof the duplex guide polynucleotide can be RNA, DNA, and/orRNA-DNA-combination sequences. In some embodiments, the crNucleotidemolecule of the duplex guide polynucleotide is referred to as “crDNA”(when composed of a contiguous stretch of DNA nucleotides) or “crRNA”(when composed of a contiguous stretch of RNA nucleotides), or“crDNA-RNA” (when composed of a combination of DNA and RNA nucleotides).The crNucleotide can comprise a fragment of the crRNA naturallyoccurring in Bacteria and Archaea. The size of the fragment of the crRNAnaturally occurring in Bacteria and Archaea that can be present in acrNucleotide disclosed herein can range from, but is not limited to, 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or morenucleotides. In some embodiments the tracrNucleotide is referred to as“tracrRNA” (when composed of a contiguous stretch of RNA nucleotides) or“tracrDNA” (when composed of a contiguous stretch of DNA nucleotides) or“tracrDNA-RNA” (when composed of a combination of DNA and RNAnucleotides. In one embodiment, the RNA that guides the RNA/Cas9endonuclease complex is a duplexed RNA comprising a duplexcrRNA-tracrRNA. The tracrRNA (trans-activating CRISPR RNA) contains, inthe 5′-to-3′ direction, (i) a sequence that anneals with the repeatregion of CRISPR type II crRNA and (ii) a stem loop-containing portion(Deltcheva et al., Nature 471:602-607). The duplex guide polynucleotidecan form a complex with a Cas endonuclease, wherein said guidepolynucleotide/Cas endonuclease complex (also referred to as a guidepolynucleotide/Cas endonuclease system) can direct the Cas endonucleaseto a genomic target site, enabling the Cas endonuclease to recognize,bind to, and optionally nick or cleave (introduce a single or doublestrand break) into the target site. (See also U.S. Patent Application US2015-0082478 A1, published on Mar. 19, 2015 and US 2015-0059010 A1,published on Feb. 26, 2015, both are hereby incorporated in its entiretyby reference.)

The guide polynucleotide can also be a single molecule (also referred toas single guide polynucleotide) comprising a crNucleotide sequencelinked to a tracrNucleotide sequence. The single guide polynucleotidecomprises a first nucleotide sequence domain (referred to as VariableTargeting domain or VT domain) that can hybridize to a nucleotidesequence in a target DNA and a Cas endonuclease recognition domain (CERdomain), that interacts with a Cas endonuclease polypeptide. By “domain”it is meant a contiguous stretch of nucleotides that can be RNA, DNA,and/or RNA-DNA-combination sequence. The VT domain and/or the CER domainof a single guide polynucleotide can comprise a RNA sequence, a DNAsequence, or a RNA-DNA-combination sequence. The single guidepolynucleotide being comprised of sequences from the crNucleotide andthe tracrNucleotide may be referred to as “single guide RNA” (whencomposed of a contiguous stretch of RNA nucleotides) or “single guideDNA” (when composed of a contiguous stretch of DNA nucleotides) or“single guide RNA-DNA” (when composed of a combination of RNA and DNAnucleotides). The single guide polynucleotide can form a complex with aCas endonuclease, wherein said guide polynucleotide/Cas endonucleasecomplex (also referred to as a guide polynucleotide/Cas endonucleasesystem) can direct the Cas endonuclease to a genomic target site,enabling the Cas endonuclease to recognize, bind to, and optionally nickor cleave (introduce a single or double strand break) the target site.(See also U.S. Patent Application US 2015-0082478 A1, published on Mar.19, 2015 and US 2015-0059010 A1, published on Feb. 26, 2015, both arehereby incorporated in its entirety by reference.)

The term “variable targeting domain” or “VT domain” is usedinterchangeably herein and includes a nucleotide sequence that canhybridize (is complementary) to one strand (nucleotide sequence) of adouble strand DNA target site. The % complementation between the firstnucleotide sequence domain (VT domain) and the target sequence can be atleast 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%,63%, 63%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%,77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%,91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%. The variable targetdomain can be at least 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,24, 25, 26, 27, 28, 29 or 30 nucleotides in length. In some embodiments,the variable targeting domain comprises a contiguous stretch of 12 to 30nucleotides. The variable targeting domain can be composed of a DNAsequence, a RNA sequence, a modified DNA sequence, a modified RNAsequence, or any combination thereof.

The term “Cas endonuclease recognition domain” or “CER domain” (of aguide polynucleotide) is used interchangeably herein and includes anucleotide sequence that interacts with a Cas endonuclease polypeptide.A CER domain comprises a tracrNucleotide mate sequence followed by atracrNucleotide sequence. The CER domain can be composed of a DNAsequence, a RNA sequence, a modified DNA sequence, a modified RNAsequence (see for example US 2015-0059010 A1, published on Feb. 26,2015, incorporated in its entirety by reference herein), or anycombination thereof.

The nucleotide sequence linking the crNucleotide and the tracrNucleotideof a single guide polynucleotide can comprise a RNA sequence, a DNAsequence, or a RNA-DNA combination sequence. In one embodiment, thenucleotide sequence linking the crNucleotide and the tracrNucleotide ofa single guide polynucleotide can be at least 3, 4, 5, 6, 7, 8, 9, 10,11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 78, 79, 80, 81,82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99or 100 nucleotides in length. In another embodiment, the nucleotidesequence linking the crNucleotide and the tracrNucleotide of a singleguide polynucleotide can comprise a tetraloop sequence, such as, but notlimiting to a GAAA tetraloop sequence.

Nucleotide sequence modification of the guide polynucleotide, VT domainand/or CER domain can be selected from, but not limited to, the groupconsisting of a 5′ cap, a 3′ polyadenylated tail, a riboswitch sequence,a stability control sequence, a sequence that forms a dsRNA duplex, amodification or sequence that targets the guide poly nucleotide to asubcellular location, a modification or sequence that provides fortracking, a modification or sequence that provides a binding site forproteins, a Locked Nucleic Acid (LNA), a 5-methyl dC nucleotide, a2,6-Diaminopurine nucleotide, a 2′-Fluoro A nucleotide, a 2′-Fluoro Unucleotide; a 2′-O-Methyl RNA nucleotide, a phosphorothioate bond,linkage to a cholesterol molecule, linkage to a polyethylene glycolmolecule, linkage to a spacer 18 molecule, a 5′ to 3′ covalent linkage,or any combination thereof. These modifications can result in at leastone additional beneficial feature, wherein the additional beneficialfeature is selected from the group of a modified or regulated stability,a subcellular targeting, tracking, a fluorescent label, a binding sitefor a protein or protein complex, modified binding affinity tocomplementary target sequence, modified resistance to cellulardegradation, and increased cellular permeability.

The terms “a polynucleotide originating from organism”, “apolynucleotide derived from organism” are used interchangeably hereinand refer to a polynucleotide (such as but not limited to crRNA andtracrRNA) that is naturally occurring in said organism (native to saidorganism) or is isolated from said organism, or is a syntheticoligonucleotide that is identical to the polynucleotide isolated fromsaid organism). For example, a tracrRNA originating from Brevibacilluslaterosporus refers to a tracrRNA that occurs in Brevibacilluslaterosporus, or is isolated from Brevibacillus laterosporus, or is asynthetic oligonucleotide that is identical to the tracrRNA isolatedfrom Brevibacillus laterosporus.

The terms “functional fragment”, “fragment that is functionallyequivalent” and “functionally equivalent fragment” of a guide RNA, crRNAor tracrRNA are used interchangeably herein, and refer to a portion orsubsequence of the guide RNA, crRNA or tracrRNA, respectively, of thepresent disclosure in which the ability to function as a guide RNA,crRNA or tracrRNA, respectively, is retained.

The terms “functional variant”, “Variant that is functionallyequivalent” and “functionally equivalent variant” of a guide RNA, crRNAor tracrRNA (respectively) are used interchangeably herein, and refer toa variant of the guide RNA, crRNA or tracrRNA, respectively, of thepresent disclosure in which the ability to function as a guide RNA,crRNA or tracrRNA, respectively, is retained.

As used herein, the terms “single guide RNA” and “sgRNA” are usedinterchangeably herein and relate to a synthetic fusion of two RNAmolecules, a crRNA (CRISPR RNA) comprising a variable targeting domain(linked to a tracr mate sequence that hybridizes to a tracrRNA), fusedto a tracrRNA (trans-activating CRISPR RNA). The single guide RNA cancomprise a crRNA or crRNA fragment and a tracrRNA or tracrRNA fragmentof the type II CRISP R/Cas system that can form a complex with a type IICas endonuclease, wherein said guide RNA/Cas endonuclease complex candirect the Cas endonuclease to a genomic target site, enabling the Casendonuclease to t recognize, bind to, and optionally nick or cleave(introduce a single or double strand break) into a genomic target site.

The components of the single or dual guide polynucleotides describedherein (such as but no limiting to the crRNA, tracrRNA, variabletargeting domain, crRNA repeat, tracr-mate domain, loop, tracrRNAanti-repeat, 3′tracrRNA sequence) can be modified to create functionalvariants of these components such that these functional variants can becombined to create a functional single or dual guide polynucleotide.Examples of guide polynucleotide component modifications are describedherein and include nucleotide extensions at the 3′ end, 5′ end, or bothend of any of components of the guide polynucleotide, and/or nucleotidesequence modifications (substitutions, insertions, deletions), and/orchemical modifications, and/or linkage modifications, or anycombinations thereof.

Extensions at 3′ end, 5′ end, or both ends of any of components of theguide polynucleotide can be can be at least 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 78, 79, 80, 81, 82,83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or100 nucleotides in length.

Nucleotide sequence modification of the guide polynucleotide componentsinclude a 5′ cap, a 3′ polyadenylated tail, a riboswitch sequence, astability control sequence, a sequence that forms a dsRNA duplex, amodification or sequence that targets the guide polynucleotide to asubcellular location, a modification or sequence that provides fortracking, a modification or sequence that provides a binding site forproteins, a Locked Nucleic Acid (LNA), a 5-methyl dC nucleotide, a2,6-Diaminopurine nucleotide, a 2′-Fluoro A nucleotide, a 2′-Fluoro Unucleotide; a 2′-O-Methyl RNA nucleotide, a phosphorothioate bond,linkage to a cholesterol molecule, linkage to a polyethylene glycolmolecule, linkage to a spacer 18 molecule, a 5′ to 3′ covalent linkage,or any combination thereof.

In one aspect, the functional variant single or dual guidepolynucleotide has a similar activity than the guide polynucleotides ofSEQ ID NOs: 127-139. In another aspect, the functional variant single ordual guide polynucleotide has an increased activity when compared to theguide polynucleotides of SEQ ID NOs: 127-139. The guide activityincludes guide polynucleotide/Cas endonuclease ability to recognize,bind to and cleave a double strand break and/or RGEN mutation frequency.

The terms “guide RNA/Cas endonuclease complex”, “guide RNA/Casendonuclease system”, “guide RNA/Cas complex”, “guide RNA/Cas system”,“gRNA/Cas complex”, “gRNA/Cas system”, “RNA-guided endonuclease”, “RGEN”are used interchangeably herein and refer to at least one RNA componentand at least one Cas endonuclease that are capable of forming a complex,wherein said guide RNA/Cas endonuclease complex can direct the Casendonuclease to a DNA target site, enabling the Cas endonuclease torecognize, bind to, and optionally nick or cleave (introduce a single ordouble strand break) the DNA target site. A guide RNA/Cas endonucleasecomplex herein can comprise Cas protein(s) and suitable RNA component(s)of any of the four known CRISPR systems (Horvath and Barrangou, Science327:167-170) such as a type I, II, or III CRISPR system. A guide RNA/Casendonuclease complex can comprise a Type II Cas9 endonuclease and atleast one RNA component (e.g., a crRNA and tracrRNA, or a gRNA). (Seealso U.S. Patent Application US 2015-0082478 A1, published on Mar. 19,2015 and US 2015-0059010 A1, published on Feb. 26, 2015, both are herebyincorporated in its entirety by reference).

The guide polynucleotide can be introduced into a cell transiently, assingle stranded polynucleotide or a double stranded polynucleotide,using any method known in the art such as, but not limited to, particlebombardment, Agrobacterium transformation or topical applications. Theguide polynucleotide can also be introduced indirectly into a cell byintroducing a recombinant DNA molecule (via methods such as, but notlimited to, particle bombardment or Agrobacterium transformation)comprising a heterologous nucleic acid fragment encoding a guidepolynucleotide, operably linked to a specific promoter that is capableof transcribing the guide RNA in said cell. The specific promoter canbe, but is not limited to, a RNA polymerase III promoter, which allowfor transcription of RNA with precisely defined, unmodified, 5′- and3′-ends (DiCarlo et al., Nucleic Acids Res. 41: 4336-4343; Ma et al.,Mol. Ther. Nucleic Acids 3:e161).

The terms “target site”, “target sequence”, “target site sequence,“target DNA”, “target locus”, “genomic target site”, “genomic targetsequence”, “genomic target locus” and “protospacer”, are usedinterchangeably herein and refer to a polynucleotide sequence such as,but not limited to, a nucleotide sequence on a chromosome, episome, orany other DNA molecule in the genome (including chromosomal,choloroplastic, mitochondrial DNA, plasmid DNA) of a cell, at which aguide polynucleotide/Cas endonuclease complex can recognize, bind to,and optionally nick or cleave. The target site can be an endogenous sitein the genome of a cell, or alternatively, the target site can beheterologous to the cell and thereby not be naturally occurring in thegenome of the cell, or the target site can be found in a heterologousgenomic location compared to where it occurs in nature. As used herein,terms “endogenous target sequence” and “native target sequence” are usedinterchangeable herein to refer to a target sequence that is endogenousor native to the genome of a cell and is at the endogenous or nativeposition of that target sequence in the genome of the cell. Cellsinclude, but are not limited to, human, non-human, animal, bacterial,fungal, insect, yeast, non-conventional yeast, and plant cells as wellas plants and seeds produced by the methods described herein. An“artificial target site” or “artificial target sequence” are usedinterchangeably herein and refer to a target sequence that has beenintroduced into the genome of a cell. Such an artificial target sequencecan be identical in sequence to an endogenous or native target sequencein the genome of a cell but be located in a different position (i.e., anon-endogenous or non-native position) in the genome of a cell.

An “altered target site”, “altered target sequence”, “modified targetsite”, “modified target sequence” are used interchangeably herein andrefer to a target sequence as disclosed herein that comprises at leastone alteration when compared to non-altered target sequence. Such“alterations” include, for example:

(i) replacement of at least one nucleotide, (ii) a deletion of at leastone nucleotide, (iii) an insertion of at least one nucleotide, or (iv)any combination of (i)-(iii).

The length of the target DNA sequence (target site) can vary, andincludes, for example, target sites that are at least 12, 13, 14, 15,16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or morenucleotides in length. It is further possible that the target site canbe palindromic, that is, the sequence on one strand reads the same inthe opposite direction on the complementary strand. The nick/cleavagesite can be within the target sequence or the nick/cleavage site couldbe outside of the target sequence. In another variation, the cleavagecould occur at nucleotide positions immediately opposite each other toproduce a blunt end cut or, in other Cases, the incisions could bestaggered to produce single-stranded overhangs, also called “stickyends”, which can be either 5′ overhangs, or 3′ overhangs. Activevariants of genomic target sites can also be used. Such active variantscan comprise at least 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%,95%, 96%, 97%, 98%, 99% or more sequence identity to the given targetsite, wherein the active variants retain biological activity and henceare capable of being recognized and cleaved by an Cas endonuclease.Assays to measure the single or double-strand break of a target site byan endonuclease are known in the art and generally measure the overallactivity and specificity of the agent on DNA substrates containingrecognition sites.

A “protospacer adjacent motif” (PAM) herein refers to a short nucleotidesequence adjacent to a target sequence (protospacer) that is recognized(targeted) by a guide polynucleotide/Cas endonuclease system describedherein. The Cas endonuclease may not successfully recognize a target DNAsequence if the target DNA sequence is not followed by a PAM sequence.The sequence and length of a PAM herein can differ depending on the Casprotein or Cas protein complex used. The PAM sequence can be of anylength but is typically 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,15, 16, 17, 18, 19 or 20 nucleotides long.

A “randomized PAM” and “randomized protospacer adjacent motif” are usedinterchangeably herein, and refer to a random DNA sequence adjacent to atarget sequence (protospacer) that is recognized (targeted) by a guidepolynucleotide/Cas endonuclease system described herein. The randomizedPAM sequence can be of any length but is typically 1, 2, 3, 4, 5, 6, 7,8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 nucleotides long. Arandomized nucleotide includes anyone of the nucleotides A, C, G or T.

The PAM sequence plays a key role in target recognition by licensingcrRNA-guided base pairing to the protospacer sequence (Szczelkun et al,2014, Proc. Natl. Acad. Sci. U.S.A 111: 9798-803). A strict PAMrequirement constrains DNA target selection and poses a limit to Cas9genome editing applications. Target site selection may be furtherconfined if unique genomic sites are required especially in largecomplex plant genomes like maize (Xie et al, 2014, Mol. Plant 7: 923-6).These constraints imposed by the PAM and the specificity of the Spy Cas9can be overcome by systematically redesigning the PAM specificity of asingle Cas9 protein (Kleinstiver et al, 2015, Nature 523, 481-485.Described herein is a different method to overcome constraints imposedby the PAM and the specificity of the Cas9, namely by exploring thenatural diversity of Cas9 proteins. The method described herein can alsobe combined with the method of systematically redesigning the PAMspecificity to overcome constraints imposed by the PAM and thespecificity of the Cas endonucleases.

Cas9 proteins from different bacteria recognize different PAM sequences(Horvath et al, 2008, J. Bacteriol. 190: 1401-12; Jinek et al, 2012,Science 337: 816-21; Gasiunas et al, 2012, Cell 154: 442-451; Zhang etal, 2013, Cell 50: 488-503; Fonfara et al, 2014, Nucleic Acids Res. 42:2577-2590). Typically, the PAM sequences of new Cas9 proteins areidentified by computational analysis of sequences immediately flankingputative protospacers in bacteriophage genomes (Shah et al, 2013, RNABiol. 10: 1-9). Currently, with >1000 Cas9 protein orthologues available(Chylinski et al, 2014 Nucleic Acids Res. 42: 6091-6105; Hsu et al,2014, Cell 157: 1262-1278), most spacers in Type II CRISPR arrays showonly a few if any matches to the phage sequences present in databases,indicating that the vast majority of the phage universe is stillunexplored. This constrains computational PAM identification methods andhinders the exploration of Cas9 protein diversity for genome editingapplications.

As described herein, to address this problem a method was developed toempirically examine the PAM sequence requirements for any Cas9 protein.The method is based on the analysis of the in vitro cleavage products ofa plasmid DNA library which contains a fixed protospacer target sequenceand a stretch of 5 or 7 randomized base pairs in the putative PAMregion. Based on the methods described herein, the a stretch ofrandomized base pairs in the putative PAM region can be at least 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 basepairs.

Using the method described herein, the canonical PAM preferences forCas9 proteins of S. pyogenes and S. thermophilus CRISPR1 and CRISPR3systems were first confirmed. Next, the method described herein wasapplied to identify the PAM and guide RNA requirements for a novel Cas9protein from the Type II CRISPR-Cas system of B. laterosporus SPP360D4.In the Type II system of B. laterosporus, the transcriptional directionof the tracrRNA and CRISPR region could not be reliably predicted bycomputational approaches. Therefore, two single guide RNA (sgRNA)variants for both possible sense and anti-sense expression scenarios ofthe tracrRNA and CRISPR array (Examples 5-8, 10-12 described herein)were synthesized and only one of the designed sgRNAs supported cleavageof the randomized PAM plasmid library by B. laterosporus Cas9. Deepsequencing analysis of the cleavage products revealed a novel PAMrequirement for the B. laterosporus Cas9. One that requires a strongpreference for a C residue at position 5 of the PAM sequence followed bymoderate preferences for A residues at positions 7 and 8 with an overallPAM consensus of NNNNCNDD (N=G, C, A or T; D=A, G or T). With a strongpreference for just a single nucleotide, B. laterosporus Cas9 provides auseful addition to the Cas9 genome editing toolbox.

To examine the genome editing potential of a novel Cas9 and sgRNAcharacterized with the method described herein, the B. laterosporusSPP360D4 Cas9 and sgRNA were tested in maize (Examples 5-8, 10-12,described herein). As a result of cleavage, imperfect DNA repairresulted in INDEL mutations at all 3 chromosomal sites tested withrobust INDEL frequencies observed at 2 of the 3 sites. Interestingly, atone of the sites, a ˜30% enhancement in the recovery of INDEL mutationswas observed for the B. laterosporus Cas9 over the S. pyogenes Cas9(Example 12).

In one embodiment described herein it is shown that cleavage ofpermissive PAMs is dependent on Cas9 concentration. For all Cas9proteins analyzed, PAM sequences licensing plasmid DNA cleavage athigher (50 nM) Cas9 concentrations were more relaxed than PAM sequencesidentified at low (0.5 nM) Cas9 concentrations. This findingcorroborates previous studies which demonstrated that lowering Cas9concentration and shortening cleavage time prevents off-target cleavageby S. pyogenes Cas9 (Pattanayak et al, 2013, Nat. Biotechnol.: 1-7; Linet al, 2014, Elife 3: e04766. doi: 10.7554/eLife.04766.). Since mostother PAM determination methods have been performed in cells or cellextracts by expressing Cas9 at undefined concentrations (Ran et al,2015, 2015 Apr. 9; 520(7546):186-91. doi: 10.1038/nature14299; Jiang etal, 2013, Nat. Biotechnol. 31: 233-9; Esvelt et al, 2013, November;10(11):1116-21. doi: 10.1038/nmeth.2681; Kleinstiver et al, 2015), ourmethod further refines PAM specificity assessments by the dose-dependentcontrol of recombinant Cas9 protein in vitro. This allows the carefuldetailed examination of Cas9 PAM specificity as a function of Cas9 guideRNA complex concentration.

In one embodiment, the method describes herein further refines Cas9 PAMdiscovery efforts by the use of recombinant Cas9 protein and reframesPAM specificity as being non-static and dependent on Cas9-guide RNAcomplex concentration.

Described herein are novel Cas endonucleases derived from diverseorganisms capable or forming guide polynucleotide/Cas endonucleasecomplexes with guide polynucleotides comprising crRNA and tracrRNAsequences fragments derived from their respective organisms. In oneexample, a Cas endonuclease derived from Brevibacillus laterosporus (SEQID NO: 140) was able to from a RGEN complex with a guide polynucleotidecomprising a crRNA and a tracrRNA fragment derived from Brevibacilluslaterosporus (such as SEQ ID NO: 47 or 127).

The Cas endonucleases described herein can also be used in complexeswith guide polynucleotides derived from other Cas systems. In oneexample, the crRNA and/or tracrRNA domains of a guide polynucleotidecapable of forming a complex with a Cas endonuclease from organism 1(such that said RGEN complex is capable of recognizing, binding to, andoptionally nicking or cleaving all or part of a specific DNA targetsequence), can be exchanged with a crRNA and/or tracrRNA domain, orfragment thereof, derived from a different organism (organism 2),thereby forming a chimeric guide, and still be able to form a functionalcomplex with the Cas endonuclease derived from organism 1.

Similarities in guide RNA s between different Cas systems can bedetermined based on sequence composition and secondary structures of theguide RNAs. In one example, the secondary structure and sequencesimilarity of the sgRNAs from Lactobacillus reuteri MIc3 (Lreu) (SEQ IDNO: 114), Lactobacillus rossiae DSM 15814 (Lros) SEQ ID NO: 115) andPediococcus pentosaceus SL4 (Ppen) SEQ ID NO: 116) were determined andrevealed that these three sgRNAs have very similar secondary structures.It is anticipated that fragments from Lreu, Lros and PPen guide RNAs,such as but not limited to repeat structures or anti-repeat structuresor any-one guide RNA domain, can be exchanged and/or mixed with oneanother to create chimeric guides capable of forming a RGEN with any oneof the Lrue, Lros or Ppen Cas endonuclease (SEQ ID NOs: 81. 82 and 93,respectively). In another example, the secondary structure and sequencesimilarity of the sgRNAs from Lactobacillus nodensis JCM 14932 (Lnod)(SEQ ID NO:117), Loktanella vestfoldensis (Lves) (SEQ ID NO:121) andSphingomonas sanxanigenens NX02 (Ssan) (SEQ ID NO: 122) was determinedto be very similar, indicating that fragments from Lnod, Lves and Ssanguide RNAs, such as but not limited to repeat structures or anti-repeatstructures or any-one guide RNA domain, can be exchanged and/or mixedwith one another to create chimeric guides capable of forming a RGENwith any one of the Lnod, Lves or Ssan Cas endonuclease (SEQ ID NOs: 84,87 and 88, respectively).

In another example, the secondary structure and sequence similarity ofthe sgRNAs from Epilithonimonas tenax DSM 16811 (Eten) (SEQ ID NO:123),Sporocytophaga myxococcoides (Smyx) (SEQ ID NO:138) and Psychroflexustorquis ATCC 700755 (Ptor) (SEQ ID NO: 139) was determined to be verysimilar, indicating that fragments from Eten, Smyx and Ptor guide RNAs,such as but not limited to repeat structures or anti-repeat structuresor any-one guide RNA domain, can be exchanged and/or mixed with oneanother to create chimeric guides capable of forming a RGEN with any oneof the Eten, Smyx or Ptor Cas endonuclease (SEQ ID NOs: 89, 90 and 91,respectively).

In one aspect, the Cas endonuclease and the crRNA and/or tracrRNA (orsgRNA) capable of forming a functional complex are derived or obtainedfrom phylogenetically related groups. (See, for example, Fonfara et alNucleic acid research 2014 Vol 42, No 4 pg. 2577-2590). It is understoodthat, based on the components of the novel Cas endonuclease systemsdescribed herein (crRNAs, tracrRNAs, Cas endonucleases, PAM sequences)one skilled in the art can exchange and/or mix any one component derivedfrom one organism with any one component derived from another organismto make a functional guide polynucleotide/Cas endonuclease complex.

Guide polynucleotides can be modified to contain different sequence orstructure yet be functionally equivalent or possess superior activity(binding, cutting, specificity). In one aspect, the chimeric guidepolynucleotide can comprise at least one nucleotide, phosphodiester bondor linkage modification, or chemical modification such as, but notlimited, to Locked Nucleic Acid (LNA), 5-methyl dC, 2,6-Diaminopurine,2′-Fluoro A, 2′-Fluoro U, 2′-O-Methyl RNA, 2′-O-Methyl (M) modification,2′-O-Methyl 3′phosphorothioate (MS) modification, 2′-O-Methyl 3′thioPACE(MSP) modification, phosphorothioate bond, linkage to a cholesterolmolecule, linkage to a polyethylene glycol molecule, linkage to a spacer18 (hexaethylene glycol chain) molecule, or 5′ to 3′ covalent linkageresulting in circularization (Hendel et al. 2015 Nature BiotechnologyVol. 33 pg. 985-991). Chimeric guide polynucleotides can be generatedchemically, with or without sugar or backbone modifications. Chimericguide polynucleotides can also be generated by in vitro transcription ordelivered by DNA molecules containing promoters for expression

The PAM interacting domain, HNH or HNH-like nuclease domain, and/or RuvCor RuvC-like nuclease domains from the Cas endonuclease proteinsdescribed herein find use for creating Cas scaffolds (US2016/0102324entitled “New compact scaffold of Cas9 in the type II CRISPR system,published Apr. 14, 2016 and incorporated herein by reference). Theboundaries of the PAM interacting domain, RuvC and HNH domains of theCas endonuclease described herein can be determined and new shorter Casendonucleases derived from the Cas endonucleases described herein (orany one functional combination/fusion protein thereof) can be designed,

The terms “targeting”, “gene targeting” and “DNA targeting” are usedinterchangeably herein. DNA targeting herein may be the specificintroduction of a knock-out, edit, or knock-in at a particular DNAsequence, such as in a chromosome or plasmid of a cell. In general, DNAtargeting can be performed herein by cleaving one or both strands at aspecific DNA sequence in a cell with a Cas protein associated with asuitable polynucleotide component. Such DNA cleavage, if a double-strandbreak (DSB), can prompt NHEJ or HDR processes which can lead tomodifications at the target site.

The terms “knock-out”, “gene knock-out” and “genetic knock-out” are usedinterchangeably herein. A knock-out represents a DNA sequence of a cellthat has been rendered partially or completely inoperative by targetingwith a Cas protein; such a DNA sequence prior to knock-out could haveencoded an amino acid sequence, or could have had a regulatory function(e.g., promoter), for example. A knock-out may be produced by an indel(insertion or deletion of nucleotide bases in a target DNA sequencethrough NHEJ), or by specific removal of sequence that reduces orcompletely destroys the function of sequence at or near the targetingsite.

In one embodiment of the disclosure, the method comprises a method formodifying a target site in the genome of a cell, the method comprisingproviding to said cell at least one Cas9 endonuclease originating froman organism selected from the group consisting of Brevibacilluslaterosporus, Lactobacillus reuteri MIc3, Lactobacillus rossiae DSM15814, Pediococcus pentosaceus SL4, Lactobacillus nodensis JCM 14932,Sulfurospirillum sp. SCADC, Bifidobacterium thermophilum DSM 20210,Loktanella vestfoldensis, Sphingomonas sanxanigenens NX02,Epilithonimonas tenax DSM 16811, Sporocytophaga myxococcoides andPsychroflexus torquis ATCC 700755, and at least one guide RNA, whereinsaid guide RNA and Cas endonuclease can form a complex that is capableof recognizing, binding to, and optionally nicking or cleaving all orpart of said target site. The embodiment can further compriseidentifying at least one cell that has a modification at said target,wherein the modification at said target site is selected from the groupconsisting of (i) a replacement of at least one nucleotide, (ii) adeletion of at least one nucleotide, (iii) an insertion of at least onenucleotide, and (iv) any combination of (i)-(iii).

The guide polynucleotide/Cas endonuclease system can be used incombination with a co-delivered polynucleotide modification template toallow for editing (modification) of a genomic nucleotide sequence ofinterest. (See also U.S. Patent Application US 2015-0082478 A1,published on Mar. 19, 2015 and WO2015/026886 A1, published on Feb. 26,2015, both are hereby incorporated in its entirety by reference.)

A “modified nucleotide” or “edited nucleotide” refers to a nucleotidesequence of interest that comprises at least one alteration whencompared to its non-modified nucleotide sequence. Such “alterations”include, for example: (i) replacement of at least one nucleotide, (ii) adeletion of at least one nucleotide, (iii) an insertion of at least onenucleotide, or (iv) any combination of (i)-(iii).

The term “polynucleotide modification template” includes apolynucleotide that comprises at least one nucleotide modification whencompared to the nucleotide sequence to be edited. A nucleotidemodification can be at least one nucleotide substitution, addition ordeletion. Optionally, the polynucleotide modification template canfurther comprise homologous nucleotide sequences flanking the at leastone nucleotide modification, wherein the flanking homologous nucleotidesequences provide sufficient homology to the desired nucleotide sequenceto be edited.

In one embodiment, the disclosure describes a method for editing anucleotide sequence in the genome of a cell, the method comprisingproviding a guide polynucleotide, a polynucleotide modificationtemplate, and at least one Cas endonuclease to a cell, wherein the Casendonuclease is capable of introducing a single or double-strand breakat a target sequence in the genome of said cell, wherein saidpolynucleotide modification template includes at least one nucleotidemodification of said nucleotide sequence. Cells include, but are notlimited to, human, non-human, animal, bacterial, fungal, insect, yeast,and plant cells as well as plants and seeds produced by the methodsdescribed herein. Plant cells include cells selected from the groupconsisting of maize, rice, sorghum, rye, barley, wheat, millet, oats,sugarcane, turfgrass, or switchgrass, soybean, canola, alfalfa,sunflower, cotton, tobacco, peanut, potato, tobacco, Arabidopsis, andsafflower cells. The nucleotide to be edited can be located within oroutside a target site recognized and cleaved by a Cas endonuclease. Inone embodiment, the at least one nucleotide modification is not amodification at a target site recognized and cleaved by a Casendonuclease. In another embodiment, there are at least 1, 2, 3, 4, 5,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,25, 26, 27, 30, 40, 50, 100, 200, 300, 400, 500, 600, 700, 900 or 1000nucleotides between the at least one nucleotide to be edited and thegenomic target site.

In one embodiment of the disclosure, the method comprises a method forediting a nucleotide sequence in the genome of a cell, the methodcomprising providing to said cell at least one Cas9 endonucleaseoriginating from an organism selected from the group consisting ofBrevibacillus laterosporus, Lactobacillus reuteri MIc3, Lactobacillusrossiae DSM 15814, Pediococcus pentosaceus SL4, Lactobacillus nodensisJCM 14932, Sulfurospirillum sp. SCADC, Bifidobacterium thermophilum DSM20210, Loktanella vestfoldensis, Sphingomonas sanxanigenens NX02,Epilithonimonas tenax DSM 16811, Sporocytophaga myxococcoides andPsychroflexus torquis ATCC 700755, a polynucleotide modificationtemplate, and at least one guide RNA, wherein said polynucleotidemodification template comprises at least one nucleotide modification ofsaid nucleotide sequence, wherein said guide RNA and Cas endonucleasecan form a complex that is capable of recognizing, binding to, andoptionally nicking or cleaving all or part of said target site. Cellsinclude, but are not limited to, human, non-human, animal, bacterial,fungal, insect, yeast, and plant cells as well as plants and seedsproduced by the methods described herein.

Genome editing can be accomplished using any method of gene editingavailable. For example, gene editing can be accomplished through theintroduction into a host cell of a polynucleotide modification template(sometimes also referred to as a gene repair oligonucleotide) containinga targeted modification to a gene within the genome of the host cell.The polynucleotide modification template for use in such methods can beeither single-stranded or double-stranded. Examples of such methods aregenerally described, for example, in US Publication No. 2013/0019349.

In some embodiments, gene editing may be facilitated through theinduction of a double-stranded break (DSB) in a defined position in thegenome near the desired alteration. DSBs can be induced using anyDSB-inducing agent available, including, but not limited to, TALENs,meganucleases, zinc finger nucleases, Cas9-gRNA systems (based onbacterial CRISPR-Cas systems), and the like. In some embodiments, theintroduction of a DSB can be combined with the introduction of apolynucleotide modification template.

The process for editing a genomic sequence combining DSB andmodification templates generally comprises: providing to a host cell, aDSB-inducing agent, or a nucleic acid encoding a DSB-inducing agent,that recognizes a target sequence in the chromosomal sequence and isable to induce a DSB in the genomic sequence, and at least onepolynucleotide modification template comprising at least one nucleotidealteration when compared to the nucleotide sequence to be edited. Thepolynucleotide modification template can further comprise nucleotidesequences flanking the at least one nucleotide alteration, in which theflanking sequences are substantially homologous to the chromosomalregion flanking the DSB. Genome editing using DSB-inducing agents, suchas Cas9-gRNA complexes, has been described, for example in U.S. PatentApplication US 2015-0082478 A1, published on Mar. 19, 2015,WO2015/026886 A1, published on Feb. 26, 2015, U.S. application62/023,246, filed on Jul. 7, 2014, and U.S. application 62/036,652,filed on Aug. 13, 2014, all of which are incorporated by referenceherein.

The terms “knock-in”, “gene knock-in, “gene insertion” and “geneticknock-in” are used interchangeably herein. A knock-in represents thereplacement or insertion of a DNA sequence at a specific DNA sequence incell by targeting with a Cas protein (by HR, wherein a suitable donorDNA polynucleotide is also used). Examples of knock-ins are a specificinsertion of a heterologous amino acid coding sequence in a codingregion of a gene, or a specific insertion of a transcriptionalregulatory element in a genetic locus.

Various methods and compositions can be employed to obtain a cell ororganism having a polynucleotide of interest inserted in a target sitefor a Cas endonuclease. Such methods can employ homologous recombinationto provide integration of the polynucleotide of Interest at the targetsite. In one method provided, a polynucleotide of interest is providedto the organism cell in a donor DNA construct. As used herein, “donorDNA” is a DNA construct that comprises a polynucleotide of Interest tobe inserted into the target site of a Cas endonuclease. The donor DNAconstruct further comprises a first and a second region of homology thatflank the polynucleotide of Interest. The first and second regions ofhomology of the donor DNA share homology to a first and a second genomicregion, respectively, present in or flanking the target site of the cellor organism genome. By “homology” is meant DNA sequences that aresimilar. For example, a “region of homology to a genomic region” that isfound on the donor DNA is a region of DNA that has a similar sequence toa given “genomic region” in the cell or organism genome. A region ofhomology can be of any length that is sufficient to promote homologousrecombination at the cleaved target site. For example, the region ofhomology can comprise at least 5-10, 5-15, 5-20, 5-25, 5-30, 5-35, 5-40,5-45, 5-50, 5-55, 5-60, 5-65, 5-70, 5-75, 5-80, 5-85, 5-90, 5-95, 5-100,5-200, 5-300, 5-400, 5-500, 5-600, 5-700, 5-800, 5-900, 5-1000, 5-1100,5-1200, 5-1300, 5-1400, 5-1500, 5-1600, 5-1700, 5-1800, 5-1900, 5-2000,5-2100, 5-2200, 5-2300, 5-2400, 5-2500, 5-2600, 5-2700, 5-2800, 5-2900,5-3000, 5-3100 or more bases in length such that the region of homologyhas sufficient homology to undergo homologous recombination with thecorresponding genomic region. “Sufficient homology” indicates that twopolynucleotide sequences have sufficient structural similarity to act assubstrates for a homologous recombination reaction. The structuralsimilarity includes overall length of each polynucleotide fragment, aswell as the sequence similarity of the polynucleotides. Sequencesimilarity can be described by the percent sequence identity over thewhole length of the sequences, and/or by conserved regions comprisinglocalized similarities such as contiguous nucleotides having 100%sequence identity, and percent sequence identity over a portion of thelength of the sequences.

The amount of homology or sequence identity shared by a target and adonor polynucleotide can vary and includes total lengths and/or regionshaving unit integral values in the ranges of about 1-20 bp, 20-50 bp,50-100 bp, 75-150 bp, 100-250 bp, 150-300 bp, 200-400 bp, 250-500 bp,300-600 bp, 350-750 bp, 400-800 bp, 450-900 bp, 500-1000 bp, 600-1250bp, 700-1500 bp, 800-1750 bp, 900-2000 bp, 1-2.5 kb, 1.5-3 kb, 2-4 kb,2.5-5 kb, 3-6 kb, 3.5-7 kb, 4-8 kb, 5-10 kb, or up to and including thetotal length of the target site. These ranges include every integerwithin the range, for example, the range of 1-20 bp includes 1, 2, 3, 4,5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20 bps. Theamount of homology can also described by percent sequence identity overthe full aligned length of the two polynucleotides which includespercent sequence identity of about at least 50%, 55%, 60%, 65%, 70%,71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%,85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%,99% or 100%. Sufficient homology includes any combination ofpolynucleotide length, global percent sequence identity, and optionallyconserved regions of contiguous nucleotides or local percent sequenceidentity, for example sufficient homology can be described as a regionof 75-150 bp having at least 80% sequence identity to a region of thetarget locus. Sufficient homology can also be described by the predictedability of two polynucleotides to specifically hybridize under highstringency conditions, see, for example, Sambrook et al., (1989)Molecular Cloning: A Laboratory Manual, (Cold Spring Harbor LaboratoryPress, NY); Current Protocols in Molecular Biology, Ausubel et al., Eds(1994) Current Protocols, (Greene Publishing Associates, Inc. and JohnWiley & Sons, Inc.); and, Tijssen (1993) Laboratory Techniques inBiochemistry and Molecular Biology—Hybridization with Nucleic AcidProbes, (Elsevier, New York).

In one embodiment of the disclosure, the method comprises a method formodifying a target site in the genome of a cell, the method comprisingproviding to said cell at least one guide RNA, at least one donor DNA,and at least one Cas9 endonuclease originating from an organism selectedfrom the group consisting of Brevibacillus laterosporus, Lactobacillusreuteri MIc3, Lactobacillus rossiae DSM 15814, Pediococcus pentosaceusSL4, Lactobacillus nodensis JCM 14932, Sulfurospirillum sp. SCADC,Bifidobacterium thermophilum DSM 20210, Loktanella vestfoldensis,Sphingomonas sanxanigenens NX02, Epilithonimonas tenax DSM 16811,Sporocytophaga myxococcoides and Psychroflexus torquis ATCC 700755,wherein said at least one guide RNA and at least one Cas endonucleasecan form a complex that is capable of recognizing, binding to, andoptionally nicking or cleaving all or part of said target site, whereinsaid donor DNA comprises a polynucleotide of interest. Cells include,but are not limited to, human, non-human, animal, bacterial, fungal,insect, yeast, and plant cells as well as plants and seeds produced bythe methods described herein. The embodiment can further comprise,identifying at least one cell that said polynucleotide of interestintegrated in or near said target site.

As used herein, a “genomic region” is a segment of a chromosome in thegenome of a cell that is present on either side of the target site or,alternatively, also comprises a portion of the target site. The genomicregion can comprise at least 5-10, 5-15, 5-20, 5-25, 5-30, 5-35, 5-40,5-45, 5-50, 5-55, 5-60, 5-65, 5-70, 5-75, 5-80, 5-85, 5-90, 5-95, 5-100,5-200, 5-300, 5-400, 5-500, 5-600, 5-700, 5-800, 5-900, 5-1000, 5-1100,5-1200, 5-1300, 5-1400, 5-1500, 5-1600, 5-1700, 5-1800, 5-1900, 5-2000,5-2100, 5-2200, 5-2300, 5-2400, 5-2500, 5-2600, 5-2700, 5-2800. 5-2900,5-3000, 5-3100 or more bases such that the genomic region has sufficienthomology to undergo homologous recombination with the correspondingregion of homology.

Polynucleotides of interest and/or traits can be stacked together in acomplex trait locus as described in US-2013-0263324-A1, published 3 Oct.2013 and in PCT/US13/22891, published Jan. 24, 2013, both applicationsare hereby incorporated by reference. The guide polynucleotide/Cas9endonuclease system described herein provides for an efficient system togenerate double strand breaks and allows for traits to be stacked in acomplex trait locus.

The guide polynucleotide/Cas endonuclease system can be used forintroducing one or more polynucleotides of interest or one or moretraits of interest into one or more target sites by providing one ormore guide polynucleotides, one Cas endonuclease, and optionally one ormore donor DNAs to a plant cell. ((as described in U.S. patentapplication Ser. No. 14/463,687, file Aug. 20, 2014, incorporated byreference herein). A fertile plant can be produced from that plant cellthat comprises an alteration at said one or more target sites, whereinthe alteration is selected from the group consisting of (i) replacementof at least one nucleotide, (ii) a deletion of at least one nucleotide,(iii) an insertion of at least one nucleotide, and (iv) any combinationof (i)-(iii). Plants comprising these altered target sites can becrossed with plants comprising at least one gene or trait of interest inthe same complex trait locus, thereby further stacking traits in saidcomplex trait locus. (see also US-2013-0263324-A1, published 3 Oct. 2013and in PCT/US13/22891, published Jan. 24, 2013).

The structural similarity between a given genomic region and thecorresponding region of homology found on the donor DNA can be anydegree of sequence identity that allows for homologous recombination tooccur. For example, the amount of homology or sequence identity sharedby the “region of homology” of the donor DNA and the “genomic region” ofthe organism genome can be at least 50%, 55%, 60%, 65%, 70%, 75%, 80%,81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%,95%, 96%, 97%, 98%, 99% or 100% sequence identity, such that thesequences undergo homologous recombination

The region of homology on the donor DNA can have homology to anysequence flanking the target site. While in some embodiments the regionsof homology share significant sequence homology to the genomic sequenceimmediately flanking the target site, it is recognized that the regionsof homology can be designed to have sufficient homology to regions thatmay be further 5′ or 3′ to the target site. In still other embodiments,the regions of homology can also have homology with a fragment of thetarget site along with downstream genomic regions. In one embodiment,the first region of homology further comprises a first fragment of thetarget site and the second region of homology comprises a secondfragment of the target site, wherein the first and second fragments aredissimilar.

As used herein, “homologous recombination” includes the exchange of DNAfragments between two DNA molecules at the sites of homology. Thefrequency of homologous recombination is influenced by a number offactors. Different organisms vary with respect to the amount ofhomologous recombination and the relative proportion of homologous tonon-homologous recombination. Generally, the length of the region ofhomology affects the frequency of homologous recombination events: thelonger the region of homology, the greater the frequency. The length ofthe homology region needed to observe homologous recombination is alsospecies-variable. In many cases, at least 5 kb of homology has beenutilized, but homologous recombination has been observed with as littleas 25-50 bp of homology. See, for example, Singer et al., (1982) Cell31:25-33; Shen and Huang, (1986) Genetics 112:441-57; Watt et al.,(1985) Proc. Natl. Acad. Sci. USA 82:4768-72, Sugawara and Haber, (1992)Mol Cell Biol 12:563-75, Rubnitz and Subramani, (1984) Mol Cell Biol4:2253-8; Ayares et al., (1986) Proc. Natl. Acad. Sci. USA 83:5199-203;Liskay et al., (1987) Genetics 115:161-7.

Homology-directed repair (HDR) is a mechanism in cells to repairdouble-stranded and single stranded DNA breaks. Homology-directed repairincludes homologous recombination (HR) and single-strand annealing (SSA)(Lieber. 2010 Annu. Rev. Biochem. 79:181-211). The most common form ofHDR is called homologous recombination (HR), which has the longestsequence homology requirements between the donor and acceptor DNA. Otherforms of HDR include single-stranded annealing (SSA) andbreakage-induced replication, and these require shorter sequencehomology relative to HR. Homology-directed repair at nicks(single-stranded breaks) can occur via a mechanism distinct from HDR atdouble-strand breaks (Davis and Maizels. PNAS (0027-8424), 111 (10), p.E924-E932.

Alteration of the genome of a plant cell, for example, throughhomologous recombination (HR), is a powerful tool for geneticengineering. Despite the low frequency of homologous recombination inhigher plants, there are a few examples of successful homologousrecombination of plant endogenous genes. The parameters for homologousrecombination in plants have primarily been investigated by rescuingintroduced truncated selectable marker genes. In these experiments, thehomologous DNA fragments were typically between 0.3 kb to 2 kb. Observedfrequencies for homologous recombination were on the order of 10⁻⁴ to10⁻⁵. See, for example, Halfter et al., (1992) Mol Gen Genet 231:186-93;Offringa et al., (1990) EMBO J 9:3077-84; Offringa et al., (1993) Proc.Natl. Acad. Sci. USA 90:7346-50; Paszkowski et al., (1988) EMBO J7:4021-6; Hourda and Paszkowski, (1994) Mol Gen Genet 243:106-11; andRisseeuw et al., (1995) Plant J 7:109-19.

Homologous recombination has been demonstrated in insects. InDrosophila, Dray and Gloor found that as little as 3 kb of totaltemplate:target homology sufficed to copy a large non-homologous segmentof DNA into the target with reasonable efficiency (Dray and Gloor,(1997) Genetics 147:689-99). Using FLP-mediated DNA integration at atarget FRT in Drosophila, Golic et al., showed integration wasapproximately 10-fold more efficient when the donor and target shared4.1 kb of homology as compared to 1.1 kb of homology (Golic et al.,(1997) Nucleic Acids Res 25:3665). Data from Drosophila indicates that2-4 kb of homology is sufficient for efficient targeting, but there issome evidence that much less homology may suffice, on the order of about30 bp to about 100 bp (Nassif and Engels, (1993) Proc. Natl. Acad. Sci.USA 90:1262-6; Keeler and Gloor, (1997) Mol Cell Biol 17:627-34).

Homologous recombination has also been accomplished in other organisms.For example, at least 150-200 bp of homology was required for homologousrecombination in the parasitic protozoan Leishmania (Papadopoulou andDumas, (1997) Nucleic Acids Res 25:4278-86). In the filamentous fungusAspergillus nidulans, gene replacement has been accomplished with aslittle as 50 bp flanking homology (Chaveroche et al., (2000) NucleicAcids Res 28:e97). Targeted gene replacement has also been demonstratedin the ciliate Tetrahymena thermophila (Gaertig et al., (1994) NucleicAcids Res 22:5391-8). In mammals, homologous recombination has been mostsuccessful in the mouse using pluripotent embryonic stem cell lines (ES)that can be grown in culture, transformed, selected and introduced intoa mouse embryo. Embryos bearing inserted transgenic ES cells develop asgenetically offspring. By interbreeding siblings, homozygous micecarrying the selected genes can be obtained. An overview of the processis provided in Watson et al., (1992) Recombinant DNA, 2nd Ed.,(Scientific American Books distributed by WH Freeman & Co.); Capecchi,(1989) Trends Genet 5:70-6; and Bronson, (1994) J Biol Chem 269:27155-8.Homologous recombination in mammals other than mouse has been limited bythe lack of stem cells capable of being transplanted to oocytes ordeveloping embryos. However, McCreath et al., Nature 405:1066-9 (2000)reported successful homologous recombination in sheep by transformationand selection in primary embryo fibroblast cells.

Error-prone DNA repair mechanisms can produce mutations at double-strandbreak sites. The Non-Homologous-End-Joining (NHEJ) pathways are the mostcommon repair mechanism to bring the broken ends together (Bleuyard etal., (2006) DNA Repair 5:1-12). The structural integrity of chromosomesis typically preserved by the repair, but deletions, insertions, orother rearrangements are possible. The two ends of one double-strandbreak are the most prevalent substrates of NHEJ (Kirik et al., (2000)EMBO J 19:5562-6), however if two different double-strand breaks occur,the free ends from different breaks can be ligated and result inchromosomal deletions (Siebert and Puchta, (2002) Plant Cell14:1121-31), or chromosomal translocations between different chromosomes(Pacher et al., (2007) Genetics 175:21-9).

Episomal DNA molecules can also be ligated into the double-strand break,for example, integration of T-DNAs into chromosomal double-strand breaks(Chilton and Que, (2003) Plant Physiol 133:956-65; Salomon and Puchta,(1998) EMBO J 17:6086-95). Once the sequence around the double-strandbreaks is altered, for example, by exonuclease activities involved inthe maturation of double-strand breaks, gene conversion pathways canrestore the original structure if a homologous sequence is available,such as a homologous chromosome in non-dividing somatic cells, or asister chromatid after DNA replication (Molinier et al., (2004) PlantCell 16:342-52). Ectopic and/or epigenic DNA sequences may also serve asa DNA repair template for homologous recombination (Puchta, (1999)Genetics 152:1173-81).

Once a double-strand break is induced in the DNA, the cell's DNA repairmechanism is activated to repair the break. Error-prone DNA repairmechanisms can produce mutations at double-strand break sites. The mostcommon repair mechanism to bring the broken ends together is thenonhomologous end-joining (NHEJ) pathway (Bleuyard et al., (2006) DNARepair 5:1-12). The structural integrity of chromosomes is typicallypreserved by the repair, but deletions, insertions, or otherrearrangements are possible (Siebert and Puchta, (2002) Plant Cell14:1121-31; Pacher et al., (2007) Genetics 175:21-9).

Alternatively, the double-strand break can be repaired by homologousrecombination between homologous DNA sequences. Once the sequence aroundthe double-strand break is altered, for example, by exonucleaseactivities involved in the maturation of double-strand breaks, geneconversion pathways can restore the original structure if a homologoussequence is available, such as a homologous chromosome in non-dividingsomatic cells, or a sister chromatid after DNA replication (Molinier etal., (2004) Plant Cell 16:342-52). Ectopic and/or epigenic DNA sequencesmay also serve as a DNA repair template for homologous recombination(Puchta, (1999) Genetics 152:1173-81).

DNA double-strand breaks appear to be an effective factor to stimulatehomologous recombination pathways (Puchta et al., (1995) Plant Mol Biol28:281-92; Tzfira and White, (2005) Trends Biotechnol 23:567-9; Puchta,(2005) J Exp Bot 56:1-14). Using DNA-breaking agents, a two- tonine-fold increase of homologous recombination was observed betweenartificially constructed homologous DNA repeats in plants (Puchta etal., (1995) Plant Mol Biol 28:281-92). In maize protoplasts, experimentswith linear DNA molecules demonstrated enhanced homologous recombinationbetween plasmids (Lyznik et al., (1991) Mol Gen Genet 230:209-18).

The donor DNA may be introduced by any means known in the art. Forexample, a plant having a target site is provided. The donor DNA may beprovided by any transformation method known in the art including, forexample, Agrobacterium-mediated transformation or biolistic particlebombardment. The donor DNA may be present transiently in the cell or itcould be introduced via a viral replicon. In the presence of the Casendonuclease and the target site, the donor DNA is inserted into thetransformed plant's genome.

Further uses for guide RNA/Cas endonuclease systems have been described(See U.S. Patent Application US 2015-0082478 A1, published on Mar. 19,2015, WO2015/026886 A1, published on Feb. 26, 2015, US 2015-0059010 A1,published on Feb. 26, 2015, U.S. application 62/023,246, filed on Jul.7, 2014, and U.S. application 62/036,652, filed on Aug. 13, 2014, all ofwhich are incorporated by reference herein) and include but are notlimited to modifying or replacing nucleotide sequences of interest (suchas a regulatory elements), insertion of polynucleotides of interest,gene knock-out, gene-knock in, modification of splicing sites and/orintroducing alternate splicing sites, modifications of nucleotidesequences encoding a protein of interest, amino acid and/or proteinfusions, and gene silencing by expressing an inverted repeat into a geneof interest.

Given the diversity of Type II CRISPR-Cas systems (Fonfara et al. (2014)Nucleic Acids Res. 42:2577-2590), it is plausible that many of the Cas9endonucleases and cognate guide RNAs may have unique sequencerecognition and enzymatic properties different from those previouslydescribed or characterized. For example, cleavage activity andspecificity may be enhanced or proto-spacer adjacent motif (PAM)sequence may be different leading to increased genomic target sitedensity. To tap into this vast unexplored diversity and expand therepertoire of Cas9 endonucleases and cognate guide RNAs available forgenome targeting, two components of target site recognition need to becooperatively characterized for each new system, the PAM sequence andthe guide RNA (either duplexed CRISPR RNA (crRNA) and trans-activatingCRISPR RNA (tracrRNA) or chimeric fusion of crRNA and tracrRNA (singleguide RNA (sgRNA. Rapid in vitro methods described herein have beendeveloped to concertedly characterize both the guide RNA and PAMsequence of Type II Cas9 proteins.

Methods for assaying Cas9 PAM preferences have been described herein(see Example 3, Example 4 and Example 7). In one embodiment, the Cas9endonuclease PAM preferences was assayed in a dose dependent manner bysubjecting the randomized PAM libraries described herein to in vitrodigestion with different concentrations of recombinant Cas9 proteinpreloaded with guide RNA. After digestion with Cas9-guide RNAribonucleoprotein (RNP) complexes, PAM sequence combinations from therandomized PAM library that supported cleavage were captured by ligatingadapters to the free-ends of the plasmid DNA molecules cleaved by theCas9-guide RNA complex (FIG. 3). To promote efficient ligation andcapture of the cleaved ends, the typically blunt-ended double-strandedDNA cut generated by Cas9 endonucleases was modified to contain a 3′ dAoverhang and adapters were modified to contain a complementary 3′dToverhang. To generate sufficient quantities of DNA for sequencing, DNAfragments harboring the PAM sequence supporting cleavage were PCRamplified using a primer in the adapter and another directly adjacent tothe PAM region. The resulting PCR amplified Cas9 PAM libraries wereconverted into ampli-seq templates and single-read deep sequenced fromthe adapter-side of the amplicon. To ensure adequate coverage, the Cas9PAM libraries were sequenced to a depth at least 5 times greater thanthe diversity in the initial randomized PAM library (5,120 and 81,920reads for the 5 and 7 bp PAM randomized libraries, respectively). PAMsequences were identified from the resulting sequence data by onlyselecting those reads containing a 12 nt sequence match flanking eitherside of the 5 or 7 nt PAM sequence (depending on the randomized PAMlibrary used); capturing only those PAM sequences resulting from perfectCas9-guide RNA target site recognition and cleavage. To compensate forthe inherent bias in the initial randomized PAM libraries, the frequencyof each PAM sequence was normalized to its frequency in the startinglibrary. The composition of the resulting PAM sequences can then beexamined using a position frequency matrix (PFM) (Stormo, 2013 Quant.Biol. 1: 115-130)

As described herein, to validate the randomness of the PAM librarydisclosed herein (PAM library validation), PCR fragments spanning the 5bp and 7 bp randomized PAM regions were generated by PhusionHigh-Fidelity DNA Polymerase (Thermo Fisher Scientific) amplification(15 cycles of a 2-step amplification protocol) using the primer paircombinations TK-119/pUC-dir and TK-113/pUC-dir (SEQ ID NO: 175/SEQ IDNO:5) for the 5 bp and 7 bp libraries, respectively. The resulting 145bp PCR product was purified using GeneJET PCR Purification Kit (ThermoFisher Scientific) and the sequences necessary for amplicon-specificbarcodes and IIlumina sequencing were “tailed” on through two rounds ofPCR each consisting of 10 cycles. In some examples, the primer paircombinations in the first round of PCR were JKYS800.1/JKYS803 andJKYS921.1 (SEQ ID NO:176)/JKYS812 (SEQ ID NO: 32) for the 5 bp and 7 bplibraries, respectively. A set of primers, JKYS557 (SEQ ID NO:177)/JKYS558 (SEQ ID NO: 178), universal to all primary PCR reactionswere utilized for the secondary PCR amplification. The resulting PCRamplifications were purified with a Qiagen PCR purification spin column,concentration measured with a Hoechst dye-based fluorometric assay,combined in an equimolar ratio and single read 60-100 nucleotide-lengthdeep sequencing was performed on IIlumina's MiSeq Personal Sequencerwith a 5-10% (v/v) spike of PhiX control v3 (Illumina, FC-110-3001) tooff-set sequence bias. The PAM sequence for only those reads containinga perfect 12 nt sequence match flanking either side of the randomizedPAM sequence were captured and used to examine the frequency anddiversity of PAM sequences present in the library.

In one embodiment of the disclosure, the method comprises a method forproducing a plasmid DNA library containing a randomizedProtospacer-Adjacent-Motif (PAM) sequence, the method comprising: a)providing a first single stranded oligonucleotide comprising a targetsequence that can be recognized by a guide RNA/Cas endonuclease complex;b) providing a second single stranded oligonucleotide comprising arandomized PAM sequence adjacent to a nucleotide sequence capable ofhybridizing with the target sequence of (a); c) producing an oligoduplexcomprising said randomized PAM sequence by combining the first singlestranded oligonucleotide of (a) and the second single strandedoligonucleotide of (b); d) producing a ligation product by ligating theoligoduplex from (c) with a linearized plasmid; and, e) transforminghost cells with the ligation product of (e) and recovering multiple hostcell colonies representing the plasmid library.

Host cells include, but are not limited to, human, non-human, animal,bacterial, fungal, insect, yeast, non-conventional yeast, and plantcells. One skilled in the art can ligate the oligoduplex of (c) directlyinto a linearized vector without restriction enzyme digestion, or canuse two restriction enzyme sites, one upstream (5′) and one downstream(3′) of the target site. The first single stranded oligonucleotide cancomprise a restriction endonuclease recognition site located upstream ofa target sequence and the ligation product of (d) is produced by firstcleaving the oligoduplex with a restriction endonuclease that recognizesthe restriction endonuclease recognition site of (a) followed byligating the cleaved oligoduplex from (d) with a linearized plasmid.

In one embodiment of the disclosure, the method comprises a method forproducing a plasmid DNA library containing a randomizedProtospacer-Adjacent-Motif (PAM) sequence, the method comprisingtransforming at least one host cell with a ligation product andrecovering multiple host cell colonies representing the plasmid library,wherein said ligation product was generated by contacting a library oflinear oligoduplexes with a linearized plasmid, wherein each oligoduplexmember of said library of oligoduplexes comprises a first singlestranded oligonucleotide comprising a-target sequence, and a secondsingle stranded oligonucleotide comprising a randomized PAM sequenceadjacent to a nucleotide sequence capable of hybridizing with saidtarget sequence. One skilled in the art can ligate the oligoduplex of(c) directly into a linearized vector without restriction enzymedigestion, or can use two restriction enzyme sites, one upstream (5′)and one downstream (3′) of the target site.

In one embodiment of the disclosure, the method comprises a method forproducing a ligation product containing a randomizedProtospacer-Adjacent-Motif (PAM) sequence, the method comprising: a)providing a first single stranded oligonucleotide comprising restrictionendonuclease recognition site located upstream of a target sequence thatcan be recognized by a guide RNA/Cas endonuclease complex; b) providinga second single stranded oligonucleotide comprising a randomized PAMsequence adjacent a nucleotide sequence capable of hybridizing with thetarget sequence of (a); c) producing an oligoduplex comprising saidrandomized PAM sequence by combining the first single strandedoligonucleotide of (a) and the second single stranded oligonucleotide of(b); and, d) producing a ligation product by ligating the oligoduplexfrom (c) with a linearized plasmid.

In one embodiment of the disclosure, the method comprises a method foridentification of a Protospacer-Adjacent-Motif (PAM) sequence, themethod comprising: a) providing a library of plasmid DNAs, wherein eachone of said plasmid DNAs comprises a randomizedProtospacer-Adjacent-Motif sequence integrated adjacent to a targetsequence that can be recognized by a guide RNA/Cas endonuclease complex;b) providing to said library of plasmids a guide RNA and a Casendonuclease protein, wherein said guide RNA and Cas endonucleaseprotein can form a complex that is capable of introducing a doublestrand break into the said target sequence, thereby creating a libraryof cleaved targets; c) ligating adaptors to the library of cleavedtargets of (b) allowing for the library of cleaved targets to beamplified; d) amplifying the library of cleaved targets such thatcleaved products containing the randomized PAM sequence are enriched,thereby producing a library of enriched PAM-sided targets; e) sequencingthe library of (a) and the library of enriched PAM-sided targets of (d)and identifying the nucleotide sequence adjacent to the cleaved targetsof (b) on either strand of the plasmid DNA, wherein said nucleotidesequence represents a putative Protospacer-Adjacent-Motif sequences;and, f) determining the fold enrichment of each nucleotide within theputative Protospacer-Adjacent-Motif sequence relative to the plasmid DNAlibrary of (a).

The randomized PAM libraries described herein can also be used incombination with immunoprecipitation then sequencing approach usingdCAS9 for further PAM discovery. The randomized PAM libraries can alsobe put on a microchip followed by cleaving the chip-array library. Therandomized PAM libraries described herein can also be used incombination with Phage-display as a method to identify PAMs. (Isalan,M., Klug, A. and Choo, Y. (2001) A rapid, generally applicable method toengineer zinc fingers illustrated by targeting the HIV-1 120 promoter.Nat. Biotechnol., 19, 656-660; Dreier, B., Fuller, R. P., Segal, D. J.,Lund, C., Blancafort, P., Huber, A., Koksch, B. and Barbas, C. F., III(2005) Development of zinc finger domains for recognition of the50-CNN-30 family DNA sequences and their use in the construction ofartificial transcription factors. J. Biol. 125 Chem., 280, 35588-35597).

In one embodiment of the disclosure, the method comprises a method foridentification of a tracrRNA of an organism, the method comprising: a)providing a first single guide RNA candidate comprising a chimericnon-naturally occurring crRNA comprising a variable targeting domaincapable of hybridizing to a target sequence in the genome of a cell,linked to a first nucleotide sequence representing the sense expressionof a candidate tracrRNA naturally occurring in said organism; b)providing a second single guide RNA candidate comprising a chimericnon-naturally occurring crRNA comprising a variable targeting domaincapable of hybridizing to a target sequence in the genome of said cell,linked to a second nucleotide sequence representing the sense expressionof a candidate tracrRNA naturally occurring in said organism; c)providing to the first and second single guide RNA candidates a Casendonuclease protein, wherein said Cas endonuclease protein can form acomplex with either the first single guide RNA candidate or the secondsingle guide RNA candidate, wherein said complex is capable ofintroducing a double strand break into said target sequence; and d)identification of the first or second guide RNA candidate and itstracrRNA component that complexes to the Cas endonuclease of (c) andresults in cleavage of the target sequence in the genome of said cell.

In one embodiment of the disclosure, the method comprises a method foridentification of a tracrRNA of an organism, the method comprising: a)identifying a CRISPR array repeat sequence in a genomic locus of saidorganism; b) aligning the CRISPR array repeat sequence of (a) with thesequence of the genomic locus of (a) and identifying an antirepeatsequence that encodes a tracrRNA; and, c) determining thetranscriptional direction of the tracrRNA.

In one embodiment of the disclosure, the method comprises a method fordesigning a single guide RNA, the method comprising: a) aligning atracrRNA sequence with a CRISPR array repeat sequence from a genomiclocus of an organism, wherein said CRISPR array repeat sequencecomprises a crRNA sequence; b) deducing the transcriptional direction ofthe CRISPR array, thereby also deducing the crRNA sequence; and, c)designing a single guide RNA comprising said tracrRNA and crRNAsequences.

In one embodiment of the disclosure, the method comprises a method forproducing target sequences, the method comprising: a) identifying apolynucleotides of interest; b) introducing a Protospacer-Adjacent-Motif(PAM) sequence adjacent to said polynucleotide of interest, wherein saidPAM sequence comprises the nucleotide sequence NNNNCND, thereby creatinga thereby creating a target site for a guide RNA/Cas9 endonucleasecomplex; and, c) identifying a polynucleotides of interest.

Polynucleotides of interest are further described herein and includepolynucleotides reflective of the commercial markets and interests ofthose involved in the development of the crop. Crops and markets ofinterest change, and as developing nations open up world markets, newcrops and technologies will emerge also. In addition, as ourunderstanding of agronomic traits and characteristics such as yield andheterosis increase, the choice of genes for genetic engineering willchange accordingly.

Further provided are methods for identifying at least one plant cell,comprising in its genome, a polynucleotide of interest integrated at thetarget site. A variety of methods are available for identifying thoseplant cells with insertion into the genome at or near to the target sitewithout using a screenable marker phenotype. Such methods can be viewedas directly analyzing a target sequence to detect any change in thetarget sequence, including but not limited to PCR methods, sequencingmethods, nuclease digestion, Southern blots, and any combinationthereof. See, for example, U.S. patent application Ser. No. 12/147,834,herein incorporated by reference to the extent necessary for the methodsdescribed herein. The method also comprises recovering a plant from theplant cell comprising a polynucleotide of Interest integrated into itsgenome. The plant may be sterile or fertile. It is recognized that anypolynucleotide of interest can be provided, integrated into the plantgenome at the target site, and expressed in a plant.

Polynucleotides/polypeptides of interest include, but are not limitedto, herbicide-resistance coding sequences, insecticidal codingsequences, nematicidal coding sequences, antimicrobial coding sequences,antifungal coding sequences, antiviral coding sequences, abiotic andbiotic stress tolerance coding sequences, or sequences modifying planttraits such as yield, grain quality, nutrient content, starch qualityand quantity, nitrogen fixation and/or utilization, fatty acids, and oilcontent and/or composition. More specific polynucleotides of interestinclude, but are not limited to, genes that improve crop yield,polypeptides that improve desirability of crops, genes encoding proteinsconferring resistance to abiotic stress, such as drought, nitrogen,temperature, salinity, toxic metals or trace elements, or thoseconferring resistance to toxins such as pesticides and herbicides, or tobiotic stress, such as attacks by fungi, viruses, bacteria, insects, andnematodes, and development of diseases associated with these organisms.General categories of genes of interest include, for example, thosegenes involved in information, such as zinc fingers, those involved incommunication, such as kinases, and those involved in housekeeping, suchas heat shock proteins. More specific categories of transgenes, forexample, include genes encoding important traits for agronomics, insectresistance, disease resistance, herbicide resistance, fertility orsterility, grain characteristics, and commercial products. Genes ofinterest include, generally, those involved in oil, starch,carbohydrate, or nutrient metabolism as well as those affecting kernelsize, sucrose loading, and the like that can be stacked or used incombination with other traits, such as but not limited to herbicideresistance, described herein.

Agronomically important traits such as oil, starch, and protein contentcan be genetically altered in addition to using traditional breedingmethods. Modifications include increasing content of oleic acid,saturated and unsaturated oils, increasing levels of lysine and sulfur,providing essential amino acids, and also modification of starch.Hordothionin protein modifications are described in U.S. Pat. Nos.5,703,049, 5,885,801, 5,885,802, and 5,990,389, herein incorporated byreference.

Polynucleotide sequences of interest may encode proteins involved inproviding disease or pest resistance. By “disease resistance” or “pestresistance” is intended that the plants avoid the harmful symptoms thatare the outcome of the plant-pathogen interactions. Pest resistancegenes may encode resistance to pests that have great yield drag such asrootworm, cutworm, European Corn Borer, and the like. Disease resistanceand insect resistance genes such as lysozymes or cecropins forantibacterial protection, or proteins such as defensins, glucanases orchitinases for antifungal protection, or Bacillus thuringiensisendotoxins, protease inhibitors, collagenases, lectins, or glycosidasesfor controlling nematodes or insects are all examples of useful geneproducts. Genes encoding disease resistance traits includedetoxification genes, such as against fumonisin (U.S. Pat. No.5,792,931); avirulence (avr) and disease resistance (R) genes (Jones etal. (1994) Science 266:789; Martin et al. (1993) Science 262:1432; andMindrinos et al. (1994) Cell 78:1089); and the like. Insect resistancegenes may encode resistance to pests that have great yield drag such asrootworm, cutworm, European Corn Borer, and the like. Such genesinclude, for example, Bacillus thuringiensis toxic protein genes (U.S.Pat. Nos. 5,366,892; 5,747,450; 5,736,514; 5,723,756; 5,593,881; andGeiser et al. (1986) Gene 48:109); and the like.

An “herbicide resistance protein” or a protein resulting from expressionof an “herbicide resistance-encoding nucleic acid molecule” includesproteins that confer upon a cell the ability to tolerate a higherconcentration of an herbicide than cells that do not express theprotein, or to tolerate a certain concentration of an herbicide for alonger period of time than cells that do not express the protein.Herbicide resistance traits may be introduced into plants by genescoding for resistance to herbicides that act to inhibit the action ofacetolactate synthase (ALS), in particular the sulfonylurea-typeherbicides, genes coding for resistance to herbicides that act toinhibit the action of glutamine synthase, such as phosphinothricin orbasta (e.g., the bar gene), glyphosate (e.g., the EPSP synthase gene andthe GAT gene), HPPD inhibitors (e.g, the HPPD gene) or other such genesknown in the art. See, for example, U.S. Pat. Nos. 7,626,077, 5,310,667,5,866,775, 6,225,114, 6,248,876, 7,169,970, 6,867,293, and U.S.Provisional Application No. 61/401,456, each of which is hereinincorporated by reference. The bar gene encodes resistance to theherbicide basta, the nptll gene encodes resistance to the antibioticskanamycin and geneticin, and the ALS-gene mutants encode resistance tothe herbicide chlorsulfuron.

Sterility genes can also be encoded in an expression cassette andprovide an alternative to physical detasseling. Examples of genes usedin such ways include male fertility genes such as MS26 (see for exampleU.S. Pat. Nos. 7,098,388, 7,517,975, 7,612,251), MS45 (see for exampleU.S. Pat. Nos. 5,478,369, 6,265,640) or MSCA1 (see for example U.S. Pat.No. 7,919,676). Maize plants (Zea mays L.) can be bred by bothself-pollination and cross-pollination techniques. Maize has maleflowers, located on the tassel, and female flowers, located on the ear,on the same plant. It can self-pollinate (“selfing”) or cross pollinate.Natural pollination occurs in maize when wind blows pollen from thetassels to the silks that protrude from the tops of the incipient ears.Pollination may be readily controlled by techniques known to those ofskill in the art. The development of maize hybrids requires thedevelopment of homozygous inbred lines, the crossing of these lines, andthe evaluation of the crosses. Pedigree breeding and recurrentselections are two of the breeding methods used to develop inbred linesfrom populations. Breeding programs combine desirable traits from two ormore inbred lines or various broad-based sources into breeding poolsfrom which new inbred lines are developed by selfing and selection ofdesired phenotypes. A hybrid maize variety is the cross of two suchinbred lines, each of which may have one or more desirablecharacteristics lacked by the other or which complement the other. Thenew inbreds are crossed with other inbred lines and the hybrids fromthese crosses are evaluated to determine which have commercialpotential. The hybrid progeny of the first generation is designated F1.The F1 hybrid is more vigorous than its inbred parents. This hybridvigor, or heterosis, can be manifested in many ways, including increasedvegetative growth and increased yield.

Hybrid maize seed can be produced by a male sterility systemincorporating manual detasseling. To produce hybrid seed, the maletassel is removed from the growing female inbred parent, which can beplanted in various alternating row patterns with the male inbred parent.Consequently, providing that there is sufficient isolation from sourcesof foreign maize pollen, the ears of the female inbred will befertilized only with pollen from the male inbred. The resulting seed istherefore hybrid (F1) and will form hybrid plants.

Field variation impacting plant development can result in plantstasseling after manual detasseling of the female parent is completed.Or, a female inbred plant tassel may not be completely removed duringthe detasseling process. In any event, the result is that the femaleplant will successfully shed pollen and some female plants will beself-pollinated. This will result in seed of the female inbred beingharvested along with the hybrid seed which is normally produced. Femaleinbred seed does not exhibit heterosis and therefore is not asproductive as F1 seed. In addition, the presence of female inbred seedcan represent a germplasm security risk for the company producing thehybrid.

Alternatively, the female inbred can be mechanically detasseled bymachine. Mechanical detasseling is approximately as reliable as handdetasseling, but is faster and less costly. However, most detasselingmachines produce more damage to the plants than hand detasseling. Thus,no form of detasseling is presently entirely satisfactory, and a needcontinues to exist for alternatives which further reduce productioncosts and to eliminate self-pollination of the female parent in theproduction of hybrid seed.

Mutations that cause male sterility in plants have the potential to beuseful in methods for hybrid seed production for crop plants such asmaize and can lower production costs by eliminating the need for thelabor-intensive removal of male flowers (also known as de-tasseling)from the maternal parent plants used as a hybrid parent. Mutations thatcause male sterility in maize have been produced by a variety of methodssuch as X-rays or UV-irradiations, chemical treatments, or transposableelement insertions (ms23, ms25, ms26, ms32) (Chaubal et al. (2000) Am JBot 87:1193-1201). Conditional regulation of fertility genes throughfertility/sterility “molecular switches” could enhance the options fordesigning new male-sterility systems for crop improvement (Unger et al.(2002) Transgenic Res 11:455-465).

Besides identification of novel genes impacting male fertility, thereremains a need to provide a reliable system of producing genetic malesterility.

In U.S. Pat. No. 5,478,369, a method is described by which the Ms45 malefertility gene was tagged and cloned on maize chromosome 9. Previously,there had been described a male fertility gene on chromosome 9, ms2,which had never been cloned and sequenced. It is not allelic to the genereferred to in the '369 patent. See Albertsen, M. and Phillips, R. L.,“Developmental Cytology of 13 Genetic Male Sterile Loci in Maize”Canadian Journal of Genetics & Cytology 23:195-208 (January 1981). Theonly fertility gene cloned before that had been the Arabidopsis genedescribed at Aarts, et al., supra.

Furthermore, it is recognized that the polynucleotide of interest mayalso comprise antisense sequences complementary to at least a portion ofthe messenger RNA (mRNA) for a targeted gene sequence of interest.Antisense nucleotides are constructed to hybridize with thecorresponding mRNA. Modifications of the antisense sequences may be madeas long as the sequences hybridize to and interfere with expression ofthe corresponding mRNA. In this manner, antisense constructions having70%, 80%, or 85% sequence identity to the corresponding antisensesequences may be used. Furthermore, portions of the antisensenucleotides may be used to disrupt the expression of the target gene.Generally, sequences of at least 50 nucleotides, 100 nucleotides, 200nucleotides, or greater may be used.

In addition, the polynucleotide of interest may also be used in thesense orientation to suppress the expression of endogenous genes inplants. Methods for suppressing gene expression in plants usingpolynucleotides in the sense orientation are known in the art. Themethods generally involve transforming plants with a DNA constructcomprising a promoter that drives expression in a plant operably linkedto at least a portion of a nucleotide sequence that corresponds to thetranscript of the endogenous gene. Typically, such a nucleotide sequencehas substantial sequence identity to the sequence of the transcript ofthe endogenous gene, generally greater than about 65% sequence identity,about 85% sequence identity, or greater than about 95% sequenceidentity. See, U.S. Pat. Nos. 5,283,184 and 5,034,323; hereinincorporated by reference.

The polynucleotide of interest can also be a phenotypic marker. Aphenotypic marker is screenable or a selectable marker that includesvisual markers and selectable markers whether it is a positive ornegative selectable marker. Any phenotypic marker can be used.Specifically, a selectable or screenable marker comprises a DNA segmentthat allows one to identify, or select for or against a molecule or acell that contains it, often under particular conditions. These markerscan encode an activity, such as, but not limited to, production of RNA,peptide, or protein, or can provide a binding site for RNA, peptides,proteins, inorganic and organic compounds or compositions and the like.

Examples of selectable markers include, but are not limited to, DNAsegments that comprise restriction enzyme sites; DNA segments thatencode products which provide resistance against otherwise toxiccompounds including antibiotics, such as, spectinomycin, ampicillin,kanamycin, tetracycline, Basta, neomycin phosphotransferase II (NEO) andhygromycin phosphotransferase (HPT)); DNA segments that encode productswhich are otherwise lacking in the recipient cell (e.g., tRNA genes,auxotrophic markers); DNA segments that encode products which can bereadily identified (e.g., phenotypic markers such as β-galactosidase,GUS; fluorescent proteins such as green fluorescent protein (GFP), cyan(CFP), yellow (YFP), red (RFP), and cell surface proteins); thegeneration of new primer sites for PCR (e.g., the juxtaposition of twoDNA sequence not previously juxtaposed), the inclusion of DNA sequencesnot acted upon or acted upon by a restriction endonuclease or other DNAmodifying enzyme, chemical, etc.; and, the inclusion of a DNA sequencesrequired for a specific modification (e.g., methylation) that allows itsidentification.

Additional selectable markers include genes that confer resistance toherbicidal compounds, such as glufosinate ammonium, bromoxynil,imidazolinones, and 2,4-dichlorophenoxyacetate (2,4-D). See for example,Yarranton, (1992) Curr Opin Biotech 3:506-11; Christopherson et al.,(1992) Proc. Natl. Acad. Sci. USA 89:6314-8; Yao et al., (1992) Cell71:63-72; Reznikoff, (1992) Mol Microbiol 6:2419-22; Hu et al., (1987)Cell 48:555-66; Brown et al., (1987) Cell 49:603-12; Figge et al.,(1988) Cell 52:713-22; Deuschle et al., (1989) Proc. Natl. Acad. Sci.USA 86:5400-4; Fuerst et al., (1989) Proc. Natl. Acad. Sci. USA86:2549-53; Deuschle et al., (1990) Science 248:480-3; Gossen, (1993)Ph.D. Thesis, University of Heidelberg; Reines et al., (1993) Proc.Natl. Acad. Sci. USA 90:1917-21; Labow et al., (1990) Mol Cell Biol10:3343-56; Zambretti et al., (1992) Proc. Natl. Acad. Sci. USA89:3952-6; Baim et al., (1991) Proc. Natl. Acad. Sci. USA 88:5072-6;Wyborski et al., (1991) Nucleic Acids Res 19:4647-53; Hillen andWissman, (1989) Topics Mol Struc Biol 10:143-62; Degenkolb et al.,(1991) Antimicrob Agents Chemother 35:1591-5; Kleinschnidt et al.,(1988) Biochemistry 27:1094-104; Bonin, (1993) Ph.D. Thesis, Universityof Heidelberg; Gossen et al., (1992) Proc. Natl. Acad. Sci. USA89:5547-51; Oliva et al., (1992) Antimicrob Agents Chemother 36:913-9;Hlavka et al., (1985) Handbook of Experimental Pharmacology, Vol. 78(Springer-Verlag, Berlin); Gill et al., (1988) Nature 334:721-4.Commercial traits can also be encoded on a gene or genes that couldincrease for example, starch for ethanol production, or provideexpression of proteins. Another important commercial use of transformedplants is the production of polymers and bioplastics such as describedin U.S. Pat. No. 5,602,321. Genes such as β-Ketothiolase, PHBase(polyhydroxyburyrate synthase), and acetoacetyl-CoA reductase (seeSchubert et al. (1988) J. Bacteriol. 170:5837-5847) facilitateexpression of polyhyroxyalkanoates (PHAs).

Exogenous products include plant enzymes and products as well as thosefrom other sources including prokaryotes and other eukaryotes. Suchproducts include enzymes, cofactors, hormones, and the like. The levelof proteins, particularly modified proteins having improved amino aciddistribution to improve the nutrient value of the plant, can beincreased. This is achieved by the expression of such proteins havingenhanced amino acid content.

The transgenes, recombinant DNA molecules, DNA sequences of interest,and polynucleotides of interest can be comprise one or more DNAsequences for gene silencing. Methods for gene silencing involving theexpression of DNA sequences in plant are known in the art include, butare not limited to, cosuppression, antisense suppression,double-stranded RNA (dsRNA) interference, hairpin RNA (hpRNA)interference, intron-containing hairpin RNA (ihpRNA) interference,transcriptional gene silencing, and micro RNA (miRNA) interference

As used herein, “nucleic acid” means a polynucleotide and includes asingle or a double-stranded polymer of deoxyribonucleotide orribonucleotide bases. Nucleic acids may also include fragments andmodified nucleotides. Thus, the terms “polynucleotide”, “nucleic acidsequence”, “nucleotide sequence” and “nucleic acid fragment” are usedinterchangeably to denote a polymer of RNA and/or DNA that is single- ordouble-stranded, optionally containing synthetic, non-natural, oraltered nucleotide bases. Nucleotides (usually found in their5′-monophosphate form) are referred to by their single letterdesignation as follows: “A” for adenosine or deoxyadenosine (for RNA orDNA, respectively), “C” for cytosine or deoxycytosine, “G” for guanosineor deoxyguanosine, “U” for uridine, “T” for deoxythymidine, “R” forpurines (A or G), “Y” for pyrimidines (C or T), “K” for G or T, “H” forA or C or T, “I” for inosine, and “N” for any nucleotide.

“Open reading frame” is abbreviated ORF.

The terms “subfragment that is functionally equivalent” and“functionally equivalent subfragment” are used interchangeably herein.These terms refer to a portion or subsequence of an isolated nucleicacid fragment in which the ability to alter gene expression or produce acertain phenotype is retained whether or not the fragment or subfragmentencodes an active enzyme. For example, the fragment or subfragment canbe used in the design of genes to produce the desired phenotype in atransformed plant. Genes can be designed for use in suppression bylinking a nucleic acid fragment or subfragment thereof, whether or notit encodes an active enzyme, in the sense or antisense orientationrelative to a plant promoter sequence.

The term “conserved domain” or “motif” means a set of amino acidsconserved at specific positions along an aligned sequence ofevolutionarily related proteins. While amino acids at other positionscan vary between homologous proteins, amino acids that are highlyconserved at specific positions indicate amino acids that are essentialto the structure, the stability, or the activity of a protein. Becausethey are identified by their high degree of conservation in alignedsequences of a family of protein homologues, they can be used asidentifiers, or “signatures”, to determine if a protein with a newlydetermined sequence belongs to a previously identified protein family.

Polynucleotide and polypeptide sequences, variants thereof, and thestructural relationships of these sequences can be described by theterms “homology”, “homologous”, “substantially identical”,“substantially similar” and “corresponding substantially” which are usedinterchangeably herein. These refer to polypeptide or nucleic acidfragments wherein changes in one or more amino acids or nucleotide basesdo not affect the function of the molecule, such as the ability tomediate gene expression or to produce a certain phenotype. These termsalso refer to modification(s) of nucleic acid fragments that do notsubstantially alter the functional properties of the resulting nucleicacid fragment relative to the initial, unmodified fragment. Thesemodifications include deletion, substitution, and/or insertion of one ormore nucleotides in the nucleic acid fragment.

Substantially similar nucleic acid sequences encompassed may be definedby their ability to hybridize (under moderately stringent conditions,e.g., 0.5×SSC, 0.1% SDS, 60° C.) with the sequences exemplified herein,or to any portion of the nucleotide sequences disclosed herein and whichare functionally equivalent to any of the nucleic acid sequencesdisclosed herein. Stringency conditions can be adjusted to screen formoderately similar fragments, such as homologous sequences fromdistantly related organisms, to highly similar fragments, such as genesthat duplicate functional enzymes from closely related organisms.Post-hybridization washes determine stringency conditions.

The term “selectively hybridizes” includes reference to hybridization,under stringent hybridization conditions, of a nucleic acid sequence toa specified nucleic acid target sequence to a detectably greater degree(e.g., at least 2-fold over background) than its hybridization tonon-target nucleic acid sequences and to the substantial exclusion ofnon-target nucleic acids. Selectively hybridizing sequences typicallyhave about at least 80% sequence identity, or 90% sequence identity, upto and including 100% sequence identity (i.e., fully complementary) witheach other.

The term “stringent conditions” or “stringent hybridization conditions”includes reference to conditions under which a probe will selectivelyhybridize to its target sequence in an in vitro hybridization assay.Stringent conditions are sequence-dependent and will be different indifferent circumstances. By controlling the stringency of thehybridization and/or washing conditions, target sequences can beidentified which are 100% complementary to the probe (homologousprobing). Alternatively, stringency conditions can be adjusted to allowsome mismatching in sequences so that lower degrees of similarity aredetected (heterologous probing). Generally, a probe is less than about1000 nucleotides in length, optionally less than 500 nucleotides inlength.

Typically, stringent conditions will be those in which the saltconcentration is less than about 1.5 M Na ion, typically about 0.01 to1.0 M Na ion concentration (or other salt(s)) at pH 7.0 to 8.3, and atleast about 30° C. for short probes (e.g., 10 to 50 nucleotides) and atleast about 60° C. for long probes (e.g., greater than 50 nucleotides).Stringent conditions may also be achieved with the addition ofdestabilizing agents such as formamide. Exemplary low stringencyconditions include hybridization with a buffer solution of 30 to 35%formamide, 1 M NaCl, 1% SDS (sodium dodecyl sulphate) at 37° C., and awash in 1× to 2×SSC (20×SSC=3.0 M NaCl/0.3 M trisodium citrate) at 50 to55° C. Exemplary moderate stringency conditions include hybridization in40 to 45% formamide, 1 M NaCl, 1% SDS at 37° C., and a wash in 0.5× to1×SSC at 55 to 60° C. Exemplary high stringency conditions includehybridization in 50% formamide, 1 M NaCl, 1% SDS at 37° C., and a washin 0.1×SSC at 60 to 65° C.

“Sequence identity” or “identity” in the context of nucleic acid orpolypeptide sequences refers to the nucleic acid bases or amino acidresidues in two sequences that are the same when aligned for maximumcorrespondence over a specified comparison window.

The term “percentage of sequence identity” refers to the valuedetermined by comparing two optimally aligned sequences over acomparison window, wherein the portion of the polynucleotide orpolypeptide sequence in the comparison window may comprise additions ordeletions (i.e., gaps) as compared to the reference sequence (which doesnot comprise additions or deletions) for optimal alignment of the twosequences. The percentage is calculated by determining the number ofpositions at which the identical nucleic acid base or amino acid residueoccurs in both sequences to yield the number of matched positions,dividing the number of matched positions by the total number ofpositions in the window of comparison and multiplying the results by 100to yield the percentage of sequence identity. Useful examples of percentsequence identities include, but are not limited to, 50%, 55%, 60%, 65%,70%, 75%, 80%, 85%, 90% or 95%, or any integer percentage from 50% to100%. These identities can be determined using any of the programsdescribed herein.

Sequence alignments and percent identity or similarity calculations maybe determined using a variety of comparison methods designed to detecthomologous sequences including, but not limited to, the MegAlign™program of the LASERGENE bioinformatics computing suite (DNASTAR Inc.,Madison, Wis.). Within the context of this application it will beunderstood that where sequence analysis software is used for analysis,that the results of the analysis will be based on the “default values”of the program referenced, unless otherwise specified. As used herein“default values” will mean any set of values or parameters thatoriginally load with the software when first initialized.

The “Clustal V method of alignment” corresponds to the alignment methodlabeled Clustal V (described by Higgins and Sharp, (1989) CABIOS5:151-153; Higgins et al., (1992) Comput Appl Biosci 8:189-191) andfound in the MegAlign™ program of the LASERGENE bioinformatics computingsuite (DNASTAR Inc., Madison, Wis.). For multiple alignments, thedefault values correspond to GAP PENALTY=10 and GAP LENGTH PENALTY=10.Default parameters for pairwise alignments and calculation of percentidentity of protein sequences using the Clustal method are KTUPLE=1, GAPPENALTY=3, WINDOW=5 and DIAGONALS SAVED=5. For nucleic acids theseparameters are KTUPLE=2, GAP PENALTY=5, WINDOW=4 and DIAGONALS SAVED=4.After alignment of the sequences using the Clustal V program, it ispossible to obtain a “percent identity” by viewing the “sequencedistances” table in the same program.

The “Clustal W method of alignment” corresponds to the alignment methodlabeled Clustal W (described by Higgins and Sharp, (1989) CABIOS5:151-153; Higgins et al., (1992) Comput Appl Biosci 8:189-191) andfound in the MegAlign™ v6.1 program of the LASERGENE bioinformaticscomputing suite (DNASTAR Inc., Madison, Wis.). Default parameters formultiple alignment (GAP PENALTY=10, GAP LENGTH PENALTY=0.2, DelayDivergen Seqs (%)=30, DNA Transition Weight=0.5, Protein WeightMatrix=Gonnet Series, DNA Weight Matrix=IUB). After alignment of thesequences using the Clustal W program, it is possible to obtain a“percent identity” by viewing the “sequence distances” table in the sameprogram.

Unless otherwise stated, sequence identity/similarity values providedherein refer to the value obtained using GAP Version 10 (GCG, Accelrys,San Diego, Calif.) using the following parameters: % identity and %similarity for a nucleotide sequence using a gap creation penalty weightof 50 and a gap length extension penalty weight of 3, and thenwsgapdna.cmp scoring matrix; % identity and % similarity for an aminoacid sequence using a GAP creation penalty weight of 8 and a gap lengthextension penalty of 2, and the BLOSUM62 scoring matrix (Henikoff andHenikoff, (1989) Proc. Natl. Acad. Sci. USA 89:10915). GAP uses thealgorithm of Needleman and Wunsch, (1970) J Mol Biol 48:443-53, to findan alignment of two complete sequences that maximizes the number ofmatches and minimizes the number of gaps. GAP considers all possiblealignments and gap positions and creates the alignment with the largestnumber of matched bases and the fewest gaps, using a gap creationpenalty and a gap extension penalty in units of matched bases.

“BLAST” is a searching algorithm provided by the National Center forBiotechnology Information (NCBI) used to find regions of similaritybetween biological sequences. The program compares nucleotide or proteinsequences to sequence databases and calculates the statisticalsignificance of matches to identify sequences having sufficientsimilarity to a query sequence such that the similarity would not bepredicted to have occurred randomly. BLAST reports the identifiedsequences and their local alignment to the query sequence.

It is well understood by one skilled in the art that many levels ofsequence identity are useful in identifying polypeptides from otherspecies or modified naturally or synthetically wherein such polypeptideshave the same or similar function or activity. Useful examples ofpercent identities include, but are not limited to, 50%, 55%, 60%, 65%,70%, 75%, 80%, 85%, 90% or 95%, or any integer percentage from 50% to100%. Indeed, any integer amino acid identity from 50% to 100% may beuseful in describing the present disclosure, such as 51%, 52%, 53%, 54%,55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%,69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%,83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%,97%, 98% or 99%.

“Gene” includes a nucleic acid fragment that expresses a functionalmolecule such as, but not limited to, a specific protein, includingregulatory sequences preceding (5′ non-coding sequences) and following(3′ non-coding sequences) the coding sequence. “Native gene” refers to agene as found in nature with its own regulatory sequences.

A “mutated gene” is a gene that has been altered through humanintervention. Such a “mutated gene” has a sequence that differs from thesequence of the corresponding non-mutated gene by at least onenucleotide addition, deletion, or substitution. In certain embodimentsof the disclosure, the mutated gene comprises an alteration that resultsfrom a guide polynucleotide/Cas endonuclease system as disclosed herein.A mutated plant is a plant comprising a mutated gene.

As used herein, a “targeted mutation” is a mutation in a native genethat was made by altering a target sequence within the native gene usinga method involving a double-strand-break-inducing agent that is capableof inducing a double-strand break in the DNA of the target sequence asdisclosed herein or known in the art.

The guide RNA/Cas endonuclease induced targeted mutation can occur in anucleotide sequence that is located within or outside a genomic targetsite that is recognized and cleaved by a Cas endonuclease.

The term “genome” as it applies to a plant cells encompasses not onlychromosomal DNA found within the nucleus, but organelle DNA found withinsubcellular components (e.g., mitochondria, or plastid) of the cell.

A “codon-modified gene” or “codon-preferred gene” or “codon-optimizedgene” is a gene having its frequency of codon usage designed to mimicthe frequency of preferred codon usage of the host cell.

An “allele” is one of several alternative forms of a gene occupying agiven locus on a chromosome. When all the alleles present at a givenlocus on a chromosome are the same, that plant is homozygous at thatlocus. If the alleles present at a given locus on a chromosome differ,that plant is heterozygous at that locus.

“Coding sequence” refers to a polynucleotide sequence which codes for aspecific amino acid sequence. “Regulatory sequences” refer to nucleotidesequences located upstream (5′ non-coding sequences), within, ordownstream (3′ non-coding sequences) of a coding sequence, and whichinfluence the transcription, RNA processing or stability, or translationof the associated coding sequence. Regulatory sequences may include, butare not limited to: promoters, translation leader sequences, 5′untranslated sequences, 3′ untranslated sequences, introns,polyadenylation target sequences, RNA processing sites, effector bindingsites, and stem-loop structures.

“A plant-optimized nucleotide sequence” is nucleotide sequence that hasbeen optimized for increased expression in plants, particularly forincreased expression in plants or in one or more plants of interest. Forexample, a plant-optimized nucleotide sequence can be synthesized bymodifying a nucleotide sequence encoding a protein such as, for example,double-strand-break-inducing agent (e.g., an endonuclease) as disclosedherein, using one or more plant-preferred codons for improvedexpression. See, for example, Campbell and Gowri (1990) Plant Physiol.92:1-11 for a discussion of host-preferred codon usage.

Methods are available in the art for synthesizing plant-preferred genes.See, for example, U.S. Pat. Nos. 5,380,831, and 5,436,391, and Murray etal. (1989) Nucleic Acids Res. 17:477-498, herein incorporated byreference. Additional sequence modifications are known to enhance geneexpression in a plant host. These include, for example, elimination of:one or more sequences encoding spurious polyadenylation signals, one ormore exon-intron splice site signals, one or more transposon-likerepeats, and other such well-characterized sequences that may bedeleterious to gene expression. The G-C content of the sequence may beadjusted to levels average for a given plant host, as calculated byreference to known genes expressed in the host plant cell. Whenpossible, the sequence is modified to avoid one or more predictedhairpin secondary mRNA structures. Thus, “a plant-optimized nucleotidesequence” of the present disclosure comprises one or more of suchsequence modifications.

A promoter is a region of DNA involved in recognition and binding of RNApolymerase and other proteins to initiate transcription. The promotersequence consists of proximal and more distal upstream elements, thelatter elements often referred to as enhancers. An “enhancer” is a DNAsequence that can stimulate promoter activity, and may be an innateelement of the promoter or a heterologous element inserted to enhancethe level or tissue-specificity of a promoter. Promoters may be derivedin their entirety from a native gene, or be composed of differentelements derived from different promoters found in nature, and/orcomprise synthetic DNA segments. It is understood by those skilled inthe art that different promoters may direct the expression of a gene indifferent tissues or cell types, or at different stages of development,or in response to different environmental conditions. It is furtherrecognized that since in most cases the exact boundaries of regulatorysequences have not been completely defined, DNA fragments of somevariation may have identical promoter activity. Promoters that cause agene to be expressed in most cell types at most times are commonlyreferred to as “constitutive promoters”.

It has been shown that certain promoters are able to direct RNAsynthesis at a higher rate than others. These are called “strongpromoters”. Certain other promoters have been shown to direct RNAsynthesis at higher levels only in particular types of cells or tissuesand are often referred to as “tissue specific promoters”, or“tissue-preferred promoters” if the promoters direct RNA synthesispreferably in certain tissues but also in other tissues at reducedlevels. Since patterns of expression of a chimeric gene (or genes)introduced into a plant are controlled using promoters, there is anongoing interest in the isolation of novel promoters which are capableof controlling the expression of a chimeric gene or (genes) at certainlevels in specific tissue types or at specific plant developmentalstages.

A plant promoter can include a promoter capable of initiatingtranscription in a plant cell, for a review of plant promoters, see,Potenza et al., (2004) In Vitro Cell Dev Biol 40:1-22. Constitutivepromoters include, for example, the core promoter of the Rsyn7 promoterand other constitutive promoters disclosed in WO99/43838 and U.S. Pat.No. 6,072,050; the core CaMV 35S promoter (Odell et al., (1985) Nature313:810-2); rice actin (McElroy et al., (1990) Plant Cell 2:163-71);ubiquitin (Christensen et al., (1989) Plant Mol Biol 12:619-32;Christensen et al., (1992) Plant Mol Biol 18:675-89); pEMU (Last et al.,(1991) Theor Appl Genet 81:581-8); MAS (Velten et al., (1984) EMBO J3:2723-30); ALS promoter (U.S. Pat. No. 5,659,026), and the like. Otherconstitutive promoters are described in, for example, U.S. Pat. Nos.5,608,149; 5,608,144; 5,604,121; 5,569,597; 5,466,785; 5,399,680;5,268,463; 5,608,142 and 6,177,611. In some examples an induciblepromoter may be used. Pathogen-inducible promoters induced followinginfection by a pathogen include, but are not limited to those regulatingexpression of PR proteins, SAR proteins, beta-1,3-glucanase, chitinase,etc.

Chemical-regulated promoters can be used to modulate the expression of agene in a plant through the application of an exogenous chemicalregulator. The promoter may be a chemical-inducible promoter, whereapplication of the chemical induces gene expression, or achemical-repressible promoter, where application of the chemicalrepresses gene expression. Chemical-inducible promoters include, but arenot limited to, the maize In2-2 promoter, activated by benzenesulfonamide herbicide safeners (De Veylder et al., (1997) Plant CellPhysiol 38:568-77), the maize GST promoter (GST-II-27, WO93/01294),activated by hydrophobic electrophilic compounds used as pre-emergentherbicides, and the tobacco PR-1a promoter (Ono et al., (2004) BiosciBiotechnol Biochem 68:803-7) activated by salicylic acid. Otherchemical-regulated promoters include steroid-responsive promoters (see,for example, the glucocorticoid-inducible promoter (Schena et al.,(1991) Proc. Natl. Acad. Sci. USA 88:10421-5; McNellis et al., (1998)Plant J 14:247-257); tetracycline-inducible and tetracycline-repressiblepromoters (Gatz et al., (1991) Mol Gen Genet 227:229-37; U.S. Pat. Nos.5,814,618 and 5,789,156).

Tissue-preferred promoters can be utilized to target enhanced expressionwithin a particular plant tissue. Tissue-preferred promoters include,for example, Kawamata et al., (1997) Plant Cell Physiol 38:792-803;Hansen et al., (1997) Mol Gen Genet 254:337-43; Russell et al., (1997)Transgenic Res 6:157-68; Rinehart et al., (1996) Plant Physiol112:1331-41; Van Camp et al., (1996) Plant Physiol 112:525-35;Canevascini et al., (1996) Plant Physiol 112:513-524; Lam, (1994)Results Probl Cell Differ 20:181-96; and Guevara-Garcia et al., (1993)Plant J 4:495-505. Leaf-preferred promoters include, for example,Yamamoto et al., (1997) Plant J 12:255-65; Kwon et al., (1994) PlantPhysiol 105:357-67; Yamamoto et al., (1994) Plant Cell Physiol 35:773-8;Gotor et al., (1993) Plant J 3:509-18; Orozco et al., (1993) Plant MolBiol 23:1129-38; Matsuoka et al., (1993) Proc. Natl. Acad. Sci. USA90:9586-90; Simpson et al., (1958) EMBO J 4:2723-9; Timko et al., (1988)Nature 318:57-8. Root-preferred promoters include, for example, Hire etal., (1992) Plant Mol Biol 20:207-18 (soybean root-specific glutaminesynthase gene); Miao et al., (1991) Plant Cell 3:11-22 (cytosolicglutamine synthase (GS)); Keller and Baumgartner, (1991) Plant Cell3:1051-61 (root-specific control element in the GRP 1.8 gene of Frenchbean); Sanger et al., (1990) Plant Mol Biol 14:433-43 (root-specificpromoter of A. tumefaciens mannopine synthase (MAS)); Bogusz et al.,(1990) Plant Cell 2:633-41 (root-specific promoters isolated fromParasponia andersonii and Trema tomentosa); Leach and Aoyagi, (1991)Plant Sci 79:69-76 (A. rhizogenes roIC and roID root-inducing genes);Teeri et al., (1989) EMBO J 8:343-50 (Agrobacterium wound-induced TR1′and TR2′ genes); VfENOD-GRP3 gene promoter (Kuster et al., (1995) PlantMol Biol 29:759-72); and rolB promoter (Capana et al., (1994) Plant MolBiol 25:681-91; phaseolin gene (Murai et al., (1983) Science 23:476-82;Sengopta-Gopalen et al., (1988) Proc. Natl. Acad. Sci. USA 82:3320-4).See also, U.S. Pat. Nos. 5,837,876; 5,750,386; 5,633,363; 5,459,252;5,401,836; 5,110,732 and 5,023,179.

Seed-preferred promoters include both seed-specific promoters activeduring seed development, as well as seed-germinating promoters activeduring seed germination. See, Thompson et al., (1989) BioEssays 10:108.Seed-preferred promoters include, but are not limited to, Cim1(cytokinin-induced message); cZ19B1 (maize 19 kDa zein); and milps(myo-inositol-1-phosphate synthase); (WO00/11177; and U.S. Pat. No.6,225,529). For dicots, seed-preferred promoters include, but are notlimited to, bean β-phaseolin, napin, β-conglycinin, soybean lectin,cruciferin, and the like. For monocots, seed-preferred promotersinclude, but are not limited to, maize 15 kDa zein, 22 kDa zein, 27 kDagamma zein, waxy, shrunken 1, shrunken 2, globulin 1, oleosin, and nuc1.See also, WO00/12733, where seed-preferred promoters from END1 and END2genes are disclosed.

The term “inducible promoter” refers to promoters that selectivelyexpress a coding sequence or functional RNA in response to the presenceof an endogenous or exogenous stimulus, for example by chemicalcompounds (chemical inducers) or in response to environmental, hormonal,chemical, and/or developmental signals. Inducible or regulated promotersinclude, for example, promoters induced or regulated by light, heat,stress, flooding or drought, salt stress, osmotic stress, phytohormones,wounding, or chemicals such as ethanol, abscisic acid (ABA), jasmonate,salicylic acid, or safeners.

An example of a stress-inducible is RD29A promoter (Kasuga et al. (1999)Nature Biotechnol. 17:287-91). One of ordinary skill in the art isfamiliar with protocols for simulating drought conditions and forevaluating drought tolerance of plants that have been subjected tosimulated or naturally-occurring drought conditions. For example, onecan simulate drought conditions by giving plants less water thannormally required or no water over a period of time, and one canevaluate drought tolerance by looking for differences in physiologicaland/or physical condition, including (but not limited to) vigor, growth,size, or root length, or in particular, leaf color or leaf area size.Other techniques for evaluating drought tolerance include measuringchlorophyll fluorescence, photosynthetic rates and gas exchange rates.Also, one of ordinary skill in the art is familiar with protocols forsimulating stress conditions such as osmotic stress, salt stress andtemperature stress and for evaluating stress tolerance of plants thathave been subjected to simulated or naturally-occurring stressconditions.

Another example of an inducible promoter useful in plant cells has beendescribed in US patent application, US 2013-0312137A1, published on Nov.21, 2013, incorporated by reference herein. US patent application US2013-0312137A1 describes a ZmCAS1 promoter from aCBSU-Anther_Subtraction library (CAS1) gene encoding a mannitoldehydrogenase from maize, and functional fragments thereof. The ZmCAS1promoter (also refered to as “CAS1 promoter”, “mannitol dehydrogenasepromoter”, “mdh promoter”) can be induced by a chemical or stresstreatment. The chemical can be a safener such as, but not limited to,N-(aminocarbonyl)-2-chlorobenzenesulfonamide (2-CBSU). The stresstreatment can be a heat treatment such as, but not limited to, a heatshock treatment (see also U.S. provisional patent application,62/120,421, filed on Feb. 25, 2015, incorporated by reference herein.

New promoters of various types useful in plant cells are constantlybeing discovered; numerous examples may be found in the compilation byOkamuro and Goldberg, (1989) In The Biochemistry of Plants, Vol. 115,Stumpf and Conn, eds (New York, N.Y.: Academic Press), pp. 1-82.

“Translation leader sequence” refers to a polynucleotide sequencelocated between the promoter sequence of a gene and the coding sequence.The translation leader sequence is present in the mRNA upstream of thetranslation start sequence. The translation leader sequence may affectprocessing of the primary transcript to mRNA, mRNA stability ortranslation efficiency. Examples of translation leader sequences havebeen described (e.g., Turner and Foster, (1995) Mol Biotechnol3:225-236).

“3′ non-coding sequences”, “transcription terminator” or “terminationsequences” refer to DNA sequences located downstream of a codingsequence and include polyadenylation recognition sequences and othersequences encoding regulatory signals capable of affecting mRNAprocessing or gene expression. The polyadenylation signal is usuallycharacterized by affecting the addition of polyadenylic acid tracts tothe 3′ end of the mRNA precursor. The use of different 3′ non-codingsequences is exemplified by Ingelbrecht et al., (1989) Plant Cell1:671-680.

“RNA transcript” refers to the product resulting from RNApolymerase-catalyzed transcription of a DNA sequence. When the RNAtranscript is a perfect complimentary copy of the DNA sequence, it isreferred to as the primary transcript or pre-mRNA. A RNA transcript isreferred to as the mature RNA or mRNA when it is a RNA sequence derivedfrom post-transcriptional processing of the primary transcript premRNAt. “Messenger RNA” or “mRNA” refers to the RNA that is withoutintrons and that can be translated into protein by the cell. “cDNA”refers to a DNA that is complementary to, and synthesized from, a mRNAtemplate using the enzyme reverse transcriptase. The cDNA can besingle-stranded or converted into double-stranded form using the Klenowfragment of DNA polymerase I. “Sense” RNA refers to RNA transcript thatincludes the mRNA and can be translated into protein within a cell or invitro. “Antisense RNA” refers to an RNA transcript that is complementaryto all or part of a target primary transcript or mRNA, and that blocksthe expression of a target gene (see, e.g., U.S. Pat. No. 5,107,065).The complementarity of an antisense RNA may be with any part of thespecific gene transcript, i.e., at the 5′ non-coding sequence, 3′non-coding sequence, introns, or the coding sequence. “Functional RNA”refers to antisense RNA, ribozyme RNA, or other RNA that may not betranslated but yet has an effect on cellular processes. The terms“complement” and “reverse complement” are used interchangeably hereinwith respect to mRNA transcripts, and are meant to define the antisenseRNA of the message.

The term “operably linked” refers to the association of nucleic acidsequences on a single nucleic acid fragment so that the function of oneis regulated by the other. For example, a promoter is operably linkedwith a coding sequence when it is capable of regulating the expressionof that coding sequence (i.e., the coding sequence is under thetranscriptional control of the promoter). Coding sequences can beoperably linked to regulatory sequences in a sense or antisenseorientation. In another example, the complementary RNA regions can beoperably linked, either directly or indirectly, 5′ to the target mRNA,or 3′ to the target mRNA, or within the target mRNA, or a firstcomplementary region is 5′ and its complement is 3′ to the target m RNA.

Standard recombinant DNA and molecular cloning techniques used hereinare well known in the art and are described more fully in Sambrook etal., Molecular Cloning: A Laboratory Manual; Cold Spring HarborLaboratory: Cold Spring Harbor, N.Y. (1989). Transformation methods arewell known to those skilled in the art and are described infra.

“PCR” or “polymerase chain reaction” is a technique for the synthesis ofspecific DNA segments and consists of a series of repetitivedenaturation, annealing, and extension cycles. Typically, adouble-stranded DNA is heat denatured, and two primers complementary tothe 3′ boundaries of the target segment are annealed to the DNA at lowtemperature, and then extended at an intermediate temperature. One setof these three consecutive steps is referred to as a “cycle”.

The term “recombinant” refers to an artificial combination of twootherwise separated segments of sequence, e.g., by chemical synthesis,or manipulation of isolated segments of nucleic acids by geneticengineering techniques.

The terms “plasmid”, “vector” and “cassette” refer to an extrachromosomal element often carrying genes that are not part of thecentral metabolism of the cell, and usually in the form ofdouble-stranded DNA. Such elements may be autonomously replicatingsequences, genome integrating sequences, phage, or nucleotide sequences,in linear or circular form, of a single- or double-stranded DNA or RNA,derived from any source, in which a number of nucleotide sequences havebeen joined or recombined into a unique construction which is capable ofintroducing a polynucleotide of interest into a cell. “Transformationcassette” refers to a specific vector containing a gene and havingelements in addition to the gene that facilitates transformation of aparticular host cell. “Expression cassette” refers to a specific vectorcontaining a gene and having elements in addition to the gene that allowfor expression of that gene in a host.

The terms “recombinant DNA molecule”, “recombinant construct”,“expression construct”, “construct”, “construct”, and “recombinant DNAconstruct” are used interchangeably herein. A recombinant constructcomprises an artificial combination of nucleic acid fragments, e.g.,regulatory and coding sequences that are not all found together innature. For example, a construct may comprise regulatory sequences andcoding sequences that are derived from different sources, or regulatorysequences and coding sequences derived from the same source, butarranged in a manner different than that found in nature. Such aconstruct may be used by itself or may be used in conjunction with avector. If a vector is used, then the choice of vector is dependent uponthe method that will be used to transform host cells as is well known tothose skilled in the art. For example, a plasmid vector can be used. Theskilled artisan is well aware of the genetic elements that must bepresent on the vector in order to successfully transform, select andpropagate host cells. The skilled artisan will also recognize thatdifferent independent transformation events may result in differentlevels and patterns of expression (Jones et al., (1985) EMBO J4:2411-2418; De Almeida et al., (1989) Mol Gen Genetics 218:78-86), andthus that multiple events are typically screened in order to obtainlines displaying the desired expression level and pattern. Suchscreening may be accomplished standard molecular biological,biochemical, and other assays including Southern analysis of DNA,Northern analysis of mRNA expression, PCR, real time quantitative PCR(qPCR), reverse transcription PCR (RT-PCR), immunoblotting analysis ofprotein expression, enzyme or activity assays, and/or phenotypicanalysis.

The term “expression”, as used herein, refers to the production of afunctional end-product (e.g., an mRNA, guide RNA, or a protein) ineither precursor or mature form.

The term “providing” includes providing a nucleic acid (e.g., expressionconstruct) or protein into a cell. Providing includes reference to theincorporation of a nucleic acid into a eukaryotic or prokaryotic cellwhere the nucleic acid may be incorporated into the genome of the cell,and includes reference to the transient provision of a nucleic acid orprotein to the cell. Introduced includes reference to stable ortransient transformation methods, as well as sexually crossing. Thus,“providing” in the context of inserting a nucleic acid fragment (e.g., arecombinant DNA construct/expression construct) into a cell, means“transfection” or “transformation” or “transduction” and includesreference to the incorporation of a nucleic acid fragment into aeukaryotic or prokaryotic cell where the nucleic acid fragment may beincorporated into the genome of the cell (e.g., chromosome, plasmid,plastid, or mitochondrial DNA), converted into an autonomous replicon,or transiently expressed (e.g., transfected mRNA).

“Mature” protein refers to a post-translationally processed polypeptide(i.e., one from which any pre- or propeptides present in the primarytranslation product have been removed). “Precursor” protein refers tothe primary product of translation of mRNA (i.e., with pre- andpropeptides still present). Pre- and propeptides may be but are notlimited to intracellular localization signals.

“Stable transformation” refers to the transfer of a nucleic acidfragment into a genome of a host organism, including both nuclear andorganellar genomes, resulting in genetically stable inheritance. Incontrast, “transient transformation” refers to the transfer of a nucleicacid fragment into the nucleus, or other DNA-containing organelle, of ahost organism resulting in gene expression without integration or stableinheritance. Host organisms containing the transformed nucleic acidfragments are referred to as “transgenic” organisms.

The commercial development of genetically improved germplasm has alsoadvanced to the stage of introducing multiple traits into crop plants,often referred to as a gene stacking approach. In this approach,multiple genes conferring different characteristics of interest can beintroduced into a plant. Gene stacking can be accomplished by many meansincluding but not limited to co-transformation, retransformation, andcrossing lines with different genes of interest.

The term “plant” refers to whole plants, plant organs, plant tissues,seeds, plant cells, seeds and progeny of the same. Plant cells include,without limitation, cells from seeds, suspension cultures, embryos,meristematic regions, callus tissue, leaves, roots, shoots,gametophytes, sporophytes, pollen and microspores. Plant parts includedifferentiated and undifferentiated tissues including, but not limitedto roots, stems, shoots, leaves, pollens, seeds, tumor tissue andvarious forms of cells and culture (e.g., single cells, protoplasts,embryos, and callus tissue). The plant tissue may be in plant or in aplant organ, tissue or cell culture. The term “plant organ” refers toplant tissue or a group of tissues that constitute a morphologically andfunctionally distinct part of a plant. The term “genome” refers to theentire complement of genetic material (genes and non-coding sequences)that is present in each cell of an organism, or virus or organelle;and/or a complete set of chromosomes inherited as a (haploid) unit fromone parent. “Progeny” comprises any subsequent generation of a plant.

A transgenic plant includes, for example, a plant which comprises withinits genome a heterologous polynucleotide introduced by a transformationstep. The heterologous polynucleotide can be stably integrated withinthe genome such that the polynucleotide is passed on to successivegenerations. The heterologous polynucleotide may be integrated into thegenome alone or as part of a recombinant DNA construct. A transgenicplant can also comprise more than one heterologous polynucleotide withinits genome. Each heterologous polynucleotide may confer a differenttrait to the transgenic plant. A heterologous polynucleotide can includea sequence that originates from a foreign species, or, if from the samespecies, can be substantially modified from its native form. Transgeniccan include any cell, cell line, callus, tissue, plant part or plant,the genotype of which has been altered by the presence of heterologousnucleic acid including those transgenics initially so altered as well asthose created by sexual crosses or asexual propagation from the initialtransgenic. The alterations of the genome (chromosomal orextra-chromosomal) by conventional plant breeding methods, by the genomeediting procedure described herein that does not result in an insertionof a foreign polynucleotide, or by naturally occurring events such asrandom cross-fertilization, non-recombinant viral infection,non-recombinant bacterial transformation, non-recombinant transposition,or spontaneous mutation are not intended to be regarded as transgenic.

In certain embodiments of the disclosure, a fertile plant is a plantthat produces viable male and female gametes and is self-fertile. Such aself-fertile plant can produce a progeny plant without the contributionfrom any other plant of a gamete and the genetic material containedtherein. Other embodiments of the disclosure can involve the use of aplant that is not self-fertile because the plant does not produce malegametes, or female gametes, or both, that are viable or otherwisecapable of fertilization. As used herein, a “male sterile plant” is aplant that does not produce male gametes that are viable or otherwisecapable of fertilization. As used herein, a “female sterile plant” is aplant that does not produce female gametes that are viable or otherwisecapable of fertilization. It is recognized that male-sterile andfemale-sterile plants can be female-fertile and male-fertile,respectively. It is further recognized that a male fertile (but femalesterile) plant can produce viable progeny when crossed with a femalefertile plant and that a female fertile (but male sterile) plant canproduce viable progeny when crossed with a male fertile plant.

The term “non-conventional yeast” herein refers to any yeast that is nota Saccharomyces (e.g., S. cerevisiae) or Schizosaccharomyces yeastspecies. Non-conventional yeast are described in Non-Conventional Yeastsin Genetics, Biochemistry and Biotechnology: Practical Protocols (K.Wolf, K. D. Breunig, G. Barth, Eds., Springer-Verlag, Berlin, Germany,2003), which is incorporated herein by reference. Non-conventional yeastin certain embodiments may additionally (or alternatively) be yeast thatfavor non-homologous end-joining (NHEJ) DNA repair processes over repairprocesses mediated by homologous recombination (HR). Definition of anon-conventional yeast along these lines—preference of NHEJ over HR—isfurther disclosed by Chen et al. (PLoS ONE 8:e57952), which isincorporated herein by reference. Preferred non-conventional yeastherein are those of the genus Yarrowia (e.g., Yarrowia lipolytica). Theterm “yeast” herein refers to fungal species that predominantly exist inunicellular form. Yeast can alternative be referred to as “yeast cells”herein. (see also U.S. provisional application 62/036,652, filed on Aug.13, 2014, which is incorporated by reference herein.

A “centimorgan” (cM) or “map unit” is the distance between two linkedgenes, markers, target sites, loci, or any pair thereof, wherein 1% ofthe products of meiosis are recombinant. Thus, a centimorgan isequivalent to a distance equal to a 1 average recombination frequencybetween the two linked genes, markers, target sites, loci, or any pairthereof.

The present disclosure finds use in the breeding of plants comprisingone or more transgenic traits. Most commonly, transgenic traits arerandomly inserted throughout the plant genome as a consequence oftransformation systems based on Agrobacterium, biolistics, or othercommonly used procedures. More recently, gene targeting protocols havebeen developed that enable directed transgene insertion. One importanttechnology, site-specific integration (SSI) enables the targeting of atransgene to the same chromosomal location as a previously insertedtransgene. Custom-designed meganucleases and custom-designed zinc fingermeganucleases allow researchers to design nucleases to target specificchromosomal locations, and these reagents allow the targeting oftransgenes at the chromosomal site cleaved by these nucleases.

The currently used systems for precision genetic engineering ofeukaryotic genomes, e.g. plant genomes, rely upon homing endonucleases,meganucleases, zinc finger nucleases, and transcription activator-likeeffector nucleases (TALENs), which require de novo protein engineeringfor every new target locus. The highly specific, RNA-directed DNAnuclease, guide RNA/Cas9 endonuclease system described herein, is moreeasily customizable and therefore more useful when modification of manydifferent target sequences is the goal. This disclosure takes furtheradvantage of the two component nature of the guide RNA/Cas system, withits constant protein component, the Cas endonuclease, and its variableand easily reprogrammable targeting component, the guide RNA or thecrRNA.

The guide RNA/Cas system described herein is especially useful forgenome engineering, especially plant genome engineering, incircumstances where nuclease off-target cutting can be toxic to thetargeted cells. In one embodiment of the guide RNA/Cas system describedherein, the constant component, in the form of an expression-optimizedCas9 gene, is stably integrated into the target genome, e.g. plantgenome. Expression of the Cas9 gene is under control of a promoter, e.g.plant promoter, which can be a constitutive promoter, tissue-specificpromoter or inducible promoter, e.g. temperature-inducible,stress-inducible, developmental stage inducible, or chemically induciblepromoter. In the absence of the variable component, i.e. the guide RNAor crRNA, the Cas9 protein is not able to cut DNA and therefore itspresence in the plant cell should have little or no consequence. Hence akey advantage of the guide RNA/Cas system described herein is theability to create and maintain a cell line or transgenic organismcapable of efficient expression of the Cas9 protein with little or noconsequence to cell viability. In order to induce cutting at desiredgenomic sites to achieve targeted genetic modifications, guide RNAs orcrRNAs can be introduced by a variety of methods into cells containingthe stably-integrated and expressed cas9 gene. For example, guide RNAsor crRNAs can be chemically or enzymatically synthesized, and introducedinto the Cas9 expressing cells via direct delivery methods such aparticle bombardment or electroporation.

Alternatively, genes capable of efficiently expressing guide RNAs orcrRNAs in the target cells can be synthesized chemically, enzymaticallyor in a biological system, and these genes can be introduced into theCas9 expressing cells via direct delivery methods such a particlebombardment, electroporation or biological delivery methods such asAgrobacterium mediated DNA delivery.

A guide RNA/Cas system mediating gene targeting can be used in methodsfor directing transgene insertion and/or for producing complextransgenic trait loci comprising multiple transgenes in a fashionsimilar as disclosed in WO2013/0198888 (published Aug. 1, 2013) whereinstead of using a double strand break inducing agent to introduce agene of interest, a guide RNA/Cas system as disclosed herein is used. Inone embodiment, a complex transgenic trait locus is a genomic locus thathas multiple transgenes genetically linked to each other. By insertingindependent transgenes within 0.1, 0.2, 0.3, 0.4, 0.5, 1.0, 2, or even 5centimorgans (cM) from each other, the transgenes can be bred as asingle genetic locus (see, for example, U.S. patent application Ser. No.13/427,138) or PCT application PCT/US2012/030061. After selecting aplant comprising a transgene, plants containing (at least) onetransgenes can be crossed to form an F1 that contains both transgenes.In progeny from these F1 (F2 or BC1) 1/500 progeny would have the twodifferent transgenes recombined onto the same chromosome. The complexlocus can then be bred as single genetic locus with both transgenetraits. This process can be repeated to stack as many traits as desired.

Chromosomal intervals that correlate with a phenotype or trait ofinterest can be identified. A variety of methods well known in the artare available for identifying chromosomal intervals. The boundaries ofsuch chromosomal intervals are drawn to encompass markers that will belinked to the gene controlling the trait of interest. In other words,the chromosomal interval is drawn such that any marker that lies withinthat interval (including the terminal markers that define the boundariesof the interval) can be used as a marker for northern leaf blightresistance. In one embodiment, the chromosomal interval comprises atleast one QTL, and furthermore, may indeed comprise more than one QTL.Close proximity of multiple QTLs in the same interval may obfuscate thecorrelation of a particular marker with a particular QTL, as one markermay demonstrate linkage to more than one QTL. Conversely, e.g., if twomarkers in close proximity show co-segregation with the desiredphenotypic trait, it is sometimes unclear if each of those markersidentifies the same QTL or two different QTL. The term “quantitativetrait locus” or “QTL” refers to a region of DNA that is associated withthe differential expression of a quantitative phenotypic trait in atleast one genetic background, e.g., in at least one breeding population.The region of the QTL encompasses or is closely linked to the gene orgenes that affect the trait in question. An “allele of a QTL” cancomprise multiple genes or other genetic factors within a contiguousgenomic region or linkage group, such as a haplotype. An allele of a QTLcan denote a haplotype within a specified window wherein said window isa contiguous genomic region that can be defined, and tracked, with a setof one or more polymorphic markers. A haplotype can be defined by theunique fingerprint of alleles at each marker within the specifiedwindow.

A variety of methods are available to identify those cells having analtered genome at or near a target site without using a screenablemarker phenotype. Such methods can be viewed as directly analyzing atarget sequence to detect any change in the target sequence, includingbut not limited to PCR methods, sequencing methods, nuclease digestion,Southern blots, and any combination thereof.

Proteins may be altered in various ways including amino acidsubstitutions, deletions, truncations, and insertions. Methods for suchmanipulations are generally known. For example, amino acid sequencevariants of the protein(s) can be prepared by mutations in the DNA.Methods for mutagenesis and nucleotide sequence alterations include, forexample, Kunkel, (1985) Proc. Natl. Acad. Sci. USA 82:488-92; Kunkel etal., (1987) Meth Enzymol 154:367-82; U.S. Pat. No. 4,873,192; Walker andGaastra, eds. (1983) Techniques in Molecular Biology (MacMillanPublishing Company, New York) and the references cited therein. Guidanceregarding amino acid substitutions not likely to affect biologicalactivity of the protein is found, for example, in the model of Dayhoffet al., (1978) Atlas of Protein Sequence and Structure (Natl Biomed ResFound, Washington, D.C.). Conservative substitutions, such as exchangingone amino acid with another having similar properties, may bepreferable. Conservative deletions, insertions, and amino acidsubstitutions are not expected to produce radical changes in thecharacteristics of the protein, and the effect of any substitution,deletion, insertion, or combination thereof can be evaluated by routinescreening assays. Assays for double-strand-break-inducing activity areknown and generally measure the overall activity and specificity of theagent on DNA substrates containing target sites.

A variety of methods are known for the introduction of nucleotidesequences and polypeptides into an organism, including, for example,transformation, sexual crossing, and the introduction of thepolypeptide, DNA, or mRNA into the cell.

Methods for contacting, providing, and/or introducing a composition intovarious organisms are known and include but are not limited to, stabletransformation methods, transient transformation methods, virus-mediatedmethods, and sexual breeding. Stable transformation indicates that theintroduced polynucleotide integrates into the genome of the organism andis capable of being inherited by progeny thereof. Transienttransformation indicates that the introduced composition is onlytemporarily expressed or present in the organism.

Protocols for introducing polynucleotides and polypeptides into plantsmay vary depending on the type of plant or plant cell targeted fortransformation, such as monocot or dicot. Suitable methods ofintroducing polynucleotides and polypeptides into plant cells andsubsequent insertion into the plant genome include microinjection(Crossway et al., (1986) Biotechniques 4:320-34 and U.S. Pat. No.6,300,543), meristem transformation (U.S. Pat. No. 5,736,369),electroporation (Riggs et al., (1986) Proc. Natl. Acad. Sci. USA83:5602-6, Agrobacterium-mediated transformation (U.S. Pat. Nos.5,563,055 and 5,981,840), direct gene transfer (Paszkowski et al.,(1984) EMBO J 3:2717-22), and ballistic particle acceleration (U.S. Pat.Nos. 4,945,050; 5,879,918; 5,886,244; 5,932,782; Tomes et al., (1995)“Direct DNA Transfer into Intact Plant Cells via MicroprojectileBombardment” in Plant Cell, Tissue, and Organ Culture: FundamentalMethods, ed. Gamborg & Phillips (Springer-Verlag, Berlin); McCabe etal., (1988) Biotechnology 6:923-6; Weissinger et al., (1988) Ann RevGenet 22:421-77; Sanford et al., (1987) Particulate Science andTechnology 5:27-37 (onion); Christou et al., (1988) Plant Physiol87:671-4 (soybean); Finer and McMullen, (1991) In Vitro Cell Dev Biol27P:175-82 (soybean); Singh et al., (1998) Theor Appl Genet 96:319-24(soybean); Datta et al., (1990) Biotechnology 8:736-40 (rice); Klein etal., (1988) Proc. Natl. Acad. Sci. USA 85:4305-9 (maize); Klein et al.,(1988) Biotechnology 6:559-63 (maize); U.S. Pat. Nos. 5,240,855;5,322,783 and 5,324,646; Klein et al., (1988) Plant Physiol 91:440-4(maize); Fromm et al., (1990) Biotechnology 8:833-9 (maize);Hooykaas-Van Slogteren et al., (1984) Nature 311:763-4; U.S. Pat. No.5,736,369 (cereals); Bytebier et al., (1987) Proc. Natl. Acad. Sci. USA84:5345-9 (Liliaceae); De Wet et al., (1985) in The ExperimentalManipulation of Ovule Tissues, ed. Chapman et al., (Longman, New York),pp. 197-209 (pollen); Kaeppler et al., (1990) Plant Cell Rep 9:415-8)and Kaeppler et al., (1992) Theor Appl Genet 84:560-6 (whisker-mediatedtransformation); D'Halluin et al., (1992) Plant Cell 4:1495-505(electroporation); Li et al., (1993) Plant Cell Rep 12:250-5; Christouand Ford (1995) Annals Botany 75:407-13 (rice) and Osjoda et al., (1996)Nat Biotechnol 14:745-50 (maize via Agrobacterium tumefaciens).

Alternatively, polynucleotides may be introduced into plants bycontacting plants with a virus or viral nucleic acids. Generally, suchmethods involve incorporating a polynucleotide within a viral DNA or RNAmolecule. In some examples a polypeptide of interest may be initiallysynthesized as part of a viral polyprotein, which is later processed byproteolysis in vivo or in vitro to produce the desired recombinantprotein. Methods for introducing polynucleotides into plants andexpressing a protein encoded therein, involving viral DNA or RNAmolecules, are known, see, for example, U.S. Pat. Nos. 5,889,191,5,889,190, 5,866,785, 5,589,367 and 5,316,931. Transient transformationmethods include, but are not limited to, the introduction ofpolypeptides, such as a double-strand break inducing agent, directlyinto the organism, the introduction of polynucleotides such as DNAand/or RNA polynucleotides, and the introduction of the RNA transcript,such as an mRNA encoding a double-strand break inducing agent, into theorganism. Such methods include, for example, microinjection or particlebombardment. See, for example Crossway et al., (1986) Mol Gen Genet202:179-85; Nomura et al., (1986) Plant Sci 44:53-8; Hepler et al.,(1994) Proc. Natl. Acad. Sci. USA 91:2176-80; and, Hush et al., (1994) JCell Sci 107:775-84.

The term “dicot” refers to the subclass of angiosperm plants also knowsas “dicotyledoneae” and includes reference to whole plants, plant organs(e.g., leaves, stems, roots, etc.), seeds, plant cells, and progeny ofthe same. Plant cell, as used herein includes, without limitation,seeds, suspension cultures, embryos, meristematic regions, callustissue, leaves, roots, shoots, gametophytes, sporophytes, pollen, andmicrospores.

The term “crossed” or “cross” or “crossing” in the context of thisdisclosure means the fusion of gametes via pollination to produceprogeny (i.e., cells, seeds, or plants). The term encompasses bothsexual crosses (the pollination of one plant by another) and selfing(self-pollination, i.e., when the pollen and ovule (or microspores andmegaspores) are from the same plant or genetically identical plants).

The term “introgression” refers to the transmission of a desired alleleof a genetic locus from one genetic background to another. For example,introgression of a desired allele at a specified locus can betransmitted to at least one progeny plant via a sexual cross between twoparent plants, where at least one of the parent plants has the desiredallele within its genome. Alternatively, for example, transmission of anallele can occur by recombination between two donor genomes, e.g., in afused protoplast, where at least one of the donor protoplasts has thedesired allele in its genome. The desired allele can be, e.g., atransgene, a modified (mutated or edited) native allele, or a selectedallele of a marker or QTL.

Standard DNA isolation, purification, molecular cloning, vectorconstruction, and verification/characterization methods are wellestablished, see, for example Sambrook et al., (1989) Molecular Cloning:A Laboratory Manual, (Cold Spring Harbor Laboratory Press, NY). Vectorsand constructs include circular plasmids, and linear polynucleotides,comprising a polynucleotide of interest and optionally other componentsincluding linkers, adapters, regulatory or analysis. In some examples arecognition site and/or target site can be contained within an intron,coding sequence, 5′ UTRs, 3′ UTRs, and/or regulatory regions.

The present disclosure further provides expression constructs forexpressing in a plant, plant cell, or plant part a guide RNA/Cas systemthat is capable of binding to and creating a double strand break in atarget site. In one embodiment, the expression constructs of thedisclosure comprise a promoter operably linked to a nucleotide sequenceencoding a Cas gene and a promoter operably linked to a guide RNA of thepresent disclosure. The promoter is capable of driving expression of anoperably linked nucleotide sequence in a plant cell.

A phenotypic marker is a screenable or selectable marker that includesvisual markers and selectable markers whether it is a positive ornegative selectable marker. Any phenotypic marker can be used.Specifically, a selectable or screenable marker comprises a DNA segmentthat allows one to identify, or select for or against a molecule or acell that contains it, often under particular conditions. These markerscan encode an activity, such as, but not limited to, production of RNA,peptide, or protein, or can provide a binding site for RNA, peptides,proteins, inorganic and organic compounds or compositions and the like.

Examples of selectable markers include, but are not limited to, DNAsegments that comprise restriction enzyme sites; DNA segments thatencode products which provide resistance against otherwise toxiccompounds including antibiotics, such as, spectinomycin, ampicillin,kanamycin, tetracycline, Basta, neomycin phosphotransferase II (NEO) andhygromycin phosphotransferase (HPT)); DNA segments that encode productswhich are otherwise lacking in the recipient cell (e.g., tRNA genes,auxotrophic markers); DNA segments that encode products which can bereadily identified (e.g., phenotypic markers such as β-galactosidase,GUS; fluorescent proteins such as green fluorescent protein (GFP), cyan(CFP), yellow (YFP), red (RFP), and cell surface proteins); thegeneration of new primer sites for PCR (e.g., the juxtaposition of twoDNA sequence not previously juxtaposed), the inclusion of DNA sequencesnot acted upon or acted upon by a restriction endonuclease or other DNAmodifying enzyme, chemical, etc.; and, the inclusion of a DNA sequencesrequired for a specific modification (e.g., methylation) that allows itsidentification.

Additional selectable markers include genes that confer resistance toherbicidal compounds, such as glufosinate ammonium, bromoxynil,imidazolinones, and 2,4-dichlorophenoxyacetate (2,4-D). See for example,Yarranton, (1992) Curr Opin Biotech 3:506-11; Christopherson et al.,(1992) Proc. Natl. Acad. Sci. USA 89:6314-8; Yao et al., (1992) Cell71:63-72; Reznikoff, (1992) Mol Microbiol 6:2419-22; Hu et al., (1987)Cell 48:555-66; Brown et al., (1987) Cell 49:603-12; Figge et al.,(1988) Cell 52:713-22; Deuschle et al., (1989) Proc. Natl. Acad. Sci.USA 86:5400-4; Fuerst et al., (1989) Proc. Natl. Acad. Sci. USA86:2549-53; Deuschle et al., (1990) Science 248:480-3; Gossen, (1993)Ph.D. Thesis, University of Heidelberg; Reines et al., (1993) Proc.Natl. Acad. Sci. USA 90:1917-21; Labow et al., (1990) Mol Cell Biol10:3343-56; Zambretti et al., (1992) Proc. Natl. Acad. Sci. USA89:3952-6; Baim et al., (1991) Proc. Natl. Acad. Sci. USA 88:5072-6;Wyborski et al., (1991) Nucleic Acids Res 19:4647-53; Hillen andWissman, (1989) Topics Mol Struc Biol 10:143-62; Degenkolb et al.,(1991) Antimicrob Agents Chemother 35:1591-5; Kleinschnidt et al.,(1988) Biochemistry 27:1094-104; Bonin, (1993) Ph.D. Thesis, Universityof Heidelberg; Gossen et al., (1992) Proc. Natl. Acad. Sci. USA89:5547-51; Oliva et al., (1992) Antimicrob Agents Chemother 36:913-9;Hlavka et al., (1985) Handbook of Experimental Pharmacology, Vol. 78(Springer-Verlag, Berlin); Gill et al., (1988) Nature 334:721-4.

The cells having the introduced sequence may be grown or regeneratedinto plants using conventional conditions, see for example, McCormick etal., (1986) Plant Cell Rep 5:81-4. These plants may then be grown, andeither pollinated with the same transformed strain or with a differenttransformed or untransformed strain, and the resulting progeny havingthe desired characteristic and/or comprising the introducedpolynucleotide or polypeptide identified. Two or more generations may begrown to ensure that the polynucleotide is stably maintained andinherited, and seeds harvested.

Any plant can be used, including monocot and dicot plants. Examples ofmonocot plants that can be used include, but are not limited to, corn(Zea mays), rice (Oryza sativa), rye (Secale cereale), sorghum (Sorghumbicolor, Sorghum vulgare), millet (e.g., pearl millet (Pennisetumglaucum), proso millet (Panicum miliaceum), foxtail millet (Setariaitalica), finger millet (Eleusine coracana)), wheat (Triticum aestivum),sugarcane (Saccharum spp.), oats (Avena), barley (Hordeum), switchgrass(Panicum virgatum), pineapple (Ananas comosus), banana (Musa spp.),palm, ornamentals, turfgrasses, and other grasses. Examples of dicotplants that can be used include, but are not limited to, soybean(Glycine max), canola (Brassica napus and B. campestris), alfalfa(Medicago sativa), tobacco (Nicotiana tabacum), Arabidopsis (Arabidopsisthaliana), sunflower (Helianthus annuus), cotton (Gossypium arboreum),and peanut (Arachis hypogaea), tomato (Solanum lycopersicum), potato(Solanum tuberosum) etc.

The transgenes, recombinant DNA molecules, DNA sequences of interest,and polynucleotides of interest can comprise one or more genes ofinterest. Such genes of interest can encode, for example, a protein thatprovides agronomic advantage to the plant.

Also provided are kits for performing any of the above methods describedherein. The kits typically contain polynucleotides encoding one or moreCas endonuclease, or Cas endonuclease protein wherein the Casendonuclease protein is provided as a purified protein, a cell lysatecomprising said Cas endonuclease, a dilution of a cell lysate comprisingsaid Cas endonuclease, an in-vitro translation mixture or an dilution ofan in-vitro translation mixture, and/or single or dual guidepolynucleotides, and/or template polynucleotides for gene editing and/ordonor polynucleotides for inserting polynucleotides of interest into agenome of interest, as described herein. The kit can further containinstructions for administering all these components into the cells. Thekits can also contain cells, buffers for transformation of cells,culture media for cells, and/or buffers for performing assays. The kitscan further contain one or more inhibitors of proteins involved in NHEJ,or components which promote or increase homology-dependent repair (HDR)and instructions for introducing the Cas endonucleases and inhibitorsinto the cells such that Cas endonuclease-mediated gene disruptionand/or targeted integration is enhanced. Optionally, cells containingthe target site(s) of the Cas endonuclease may also be included in thekits described herein.

Inhibitors of non-homologous end joining (NHEJ) are known in the art andinclude molecules, such as but not limited to small molecules thatinhibits (decrease) the binding or activity of a DNA-dependent-proteinkinase catalytic subunit (DNA-PKcs), a Poly(ADP-ribose) polymerase 1/2(PARPI/2), a PARPI, Ku70/80, a DNA-PKcs, a XRCC4/XLF, a Ligase IV, aLigase III, a XRCCI, an Artemis Polynucleotide Kinase (PNK), SCR7, andany one combinations thereof (Sfeir et al. 2015, TIBS Vol 40 (11), pp701-713; Srivastava, M. et al. An inhibitor of nonhomologous end-joiningabrogates double-strand break repair and impedes cancer progression.Cell 151, 1474-1487 (2012); US patent application US2014/0242702,published on Aug. 28, 2014, herein incorporated in its entirety byreference). Other molecules that decrease the activity of thenon-homologous end joining (NHEJ) DNA repair complex are known in theart and include RNAi-molecules, antisense nucleic acid molecules,ribozymes, compounds inhibiting the formation of a functional DNA LigaseIV (LIG4) complex and compounds enhancing proteolytic degradation of afunctional DNA Ligase IV complex (US patent application 2014/0304847,published on Oct. 9, 2014, herein incorporated in its entirety byreference.

Activators of HDR are known in the art and include molecules, such asbut not limited to RS1, RAD51 and RAD51B (Song et al. 2016 “RS-1enhances CRISPR/Cas9- and TALEN-mediated knock-in efficiency” Naturecommunications 7, Article number:10548; Takaku, M. et al 2009.Recombination activator function of the novel RAD51- and RAD51B-bindingprotein, human EVL. J. Biol. Chem. 284, 14326-14336 (2009).

In certain embodiments, the kits comprise at least one construct with atarget gene and a Cas endonuclease described herein capable of cleavingwithin or in close proximity to the target gene. Such kits are usefulfor optimization of cleavage conditions in a variety of varying hostcell types. In one aspect, the kit is a kit useful for increasing genedisruption, gene editing and/or targeted integration following Casendonuclease mediated cleavage of a cell's genome.

In one embodiment, the kit includes a Cas endonuclease described hereincapable of cleaving within a known target locus within a genome, and mayadditionally comprise a template DNA for gene editing and/or a donornucleic acid for introducing a polynucleotide of interest into thecell's genome. Such kits are useful for optimization of conditions fortemplate recognition, donor integration or for the construction ofspecifically modified cells, cell lines, and transgenic plants andanimals containing gene disruptions, gene edits or targeted insertions.These and other aspects will be readily apparent to the skilled artisanin light of disclosure as a whole.

Also provided are kits containing any one or more of the elementsdisclosed in compositions described herein. In one aspect, the kitscomprise a single guide polynucleotide comprising a crRNA, as describedherein linked to a tracrRNA, wherein the crRNA comprises a variabletargeting domain operably linked to a tracr mate sequence and/or one ormore insertion sites for inserting or exchanging the variable targetingdomain upstream of the tracr mate sequence, wherein when expressed, thesingle guide polynucleotide directs sequence-specific binding of a guidepolynucleotide/Cas endonuclease complex to a target sequence in aeukaryotic cell. In another aspect, the kits comprise a dual guidepolynucleotide comprising a crRNA molecule and a tracrRNA molecule, asdescribed herein, wherein the crRNA molecule comprises a variabletargeting domain operably linked to a tracr mate sequence and/or one ormore insertion sites for inserting or exchanging the variable targetingdomain upstream of the tracr mate sequence, wherein when expressed, thedual guide polynucleotide directs sequence-specific binding of a guidepolynucleotide/Cas endonuclease complex to a target sequence in aeukaryotic cell.

The kits can contain one or more vectors encoding the guidepolynucleotides, Cas endonucleases and/or template DNAs and/or donorDNAs described herein, and or the kits can contain the elements (guidepolynucleotides, DNA templates, DNA donors and/or Cas endonucleases inpurified or non-purified forms).

In one aspect, the kit comprises a Cas endonuclease as described herein,and/or a polynucleotide modification template and/or a donor DNA forinserting a polynucleotide of interest as described herein.

Components may be provide individually or in combinations, and may beprovided in any suitable container, such as a vial, a bottle, or a tube.For example, a kit may provide one or more reaction or storage buffers.Reagents may be provided in a form that is usable in a particular assay,or in a form that requires addition of one or more other componentsbefore use (e.g. in concentrate or lyophilized form). A buffer can beany buffer, including but not limited to a sodium carbonate buffer, asodium bicarbonate buffer, a borate buffer, a Tris buffer, a MOPSbuffer, a HEPES buffer, and combinations thereof. In some embodiments,the buffer is alkaline. In some embodiments, the buffer has a pH fromabout 7 to about 10. In some aspects, the kit includes instructions inone or more languages, for example in more than one language.

The meaning of abbreviations is as follows: “sec” means second(s), “min”means minute(s), “h” means hour(s), “d” means day(s), “μL” meansmicroliter(s), “mL” means milliliter(s), “L” means liter(s), “μM” meansmicromolar, “mM” means millimolar, “M” means molar, “mmol” meansmillimole(s), “μmole” mean micromole(s), “g” means gram(s), “μg” meansmicrogram(s), “ng” means nanogram(s), “U” means unit(s), “bp” means basepair(s) and “kb” means kilobase(s).

Non-limiting examples of compositions and methods disclosed herein areas follows:

-   1. A method for producing a plasmid DNA library containing a    randomized Protospacer-Adjacent-Motif (PAM) sequence, the method    comprising:    -   a) providing a first single stranded oligonucleotide comprising        a target sequence that can be recognized by a guide RNA/Cas        endonuclease complex;    -   b) providing a second single stranded oligonucleotide comprising        a randomized PAM sequence adjacent to a nucleotide sequence        capable of hybridizing with the target sequence of (a);    -   c) producing an oligoduplex comprising said randomized PAM        sequence by combining the first single stranded oligonucleotide        of (a) and the second single stranded oligonucleotide of (b);    -   d) producing a ligation product by ligating the oligoduplex        from (c) with a linearized plasmid; and,    -   e) transforming host cells with the ligation product of (e) and        recovering multiple host cell colonies representing the plasmid        library.-   2. A method for producing a ligation product containing a randomized    Protospacer-Adjacent-Motif (PAM) sequence, the method comprising:    -   a) providing a first single stranded oligonucleotide comprising        restriction endonuclease recognition site located upstream of a        target sequence that can be recognized by a guide RNA/Cas        endonuclease complex;    -   b) providing a second single stranded oligonucleotide comprising        a randomized PAM sequence adjacent a nucleotide sequence capable        of hybridizing with the target sequence of (a);    -   c) producing an oligoduplex comprising said randomized PAM        sequence by combining the first single stranded oligonucleotide        of (a) and the second single stranded oligonucleotide of (b);        and,    -   d) producing a ligation product by ligating the oligoduplex        from (c) with a linearized plasmid;-   3. The method of embodiment 1, wherein the host cells of (e) are E.    coli cells.-   4. A ligation product produced by the method of anyone of    embodiments 1-2.-   5. A library of host cells produced by the method of embodiment 1.-   6. The method of anyone of embodiments 1-2, wherein the first single    stranded oligonucleotide comprises a restriction endonuclease    recognition site located upstream of a target sequence and wherein    the ligation product of (d) is produced by first cleaving the    oligoduplex with a restriction endonuclease that recognizes the    restriction endonuclease recognition site of (a) followed by    ligating the cleaved oligoduplex from (d) with a linearized plasmid.-   7. The method of anyone of embodiments 1-2, wherein the second    single stranded oligonucleotide comprises a randomized PAM of at    least 5 randomized nucleotides (5Ns).-   8. The method of anyone of embodiments 1-2, wherein the second    single stranded oligonucleotide comprises a randomized PAM of at    least 7 randomized nucleotides (7Ns).

9. A method for identification of a Protospacer-Adjacent-Motif (PAM)sequence, the method comprising:

-   -   a) providing a library of plasmid DNAs, wherein each one of said        plasmid DNAs comprises a randomized Protospacer-Adjacent-Motif        sequence integrated adjacent to a target sequence that can be        recognized by a guide RNA/Cas endonuclease complex;    -   b) providing to said library of plasmids a guide RNA and a Cas        endonuclease protein, wherein said guide RNA and Cas        endonuclease protein can form a complex that is capable of        introducing a double strand break into the said target sequence,        thereby creating a library of cleaved targets;    -   c) ligating adaptors to the library of cleaved targets of (b)        allowing for the library of cleaved targets to be amplified;    -   d) amplifying the library of cleaved targets such that cleaved        products containing the randomized PAM sequence are enriched,        thereby producing a library of enriched PAM-sided targets;    -   e) sequencing the library of (a) and the library of enriched        PAM-sided targets of (d) and identifying the nucleotide sequence        adjacent to the cleaved targets of (b) on either strand of the        plasmid DNA, wherein said nucleotide sequence represents a        putative Protospacer-Adjacent-Motif sequences; and,    -   f) determining the fold enrichment of each nucleotide within the        putative Protospacer-Adjacent-Motif sequence relative to the        plasmid DNA library of (a).

-   10. The method of anyone of embodiments 1-2 and 9, wherein the    randomized PAM sequence comprises at least 1, 2, 3, 4, 5, 6, 7, 8,    9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 randomized    nucleotides.

-   11. The method of anyone of anyone of embodiments 1-2 and 9, wherein    the target sequence is at least 12, 13, 14, 15, 16, 17, 18, 19, 20,    21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 nucleotides in length.

-   12. The method of embodiment 9, wherein the Cas endonuclease is a    Cas9 endonuclease from an organism selected from the group    consisting of Brevibacillus laterosporus, Lactobacillus reuteri    MIc3, Lactobacillus rossiae DSM 15814, Pediococcus pentosaceus SL4,    Lactobacillus nodensis JCM 14932, Sulfurospirillum sp. SCADC,    Bifidobacterium thermophilum DSM 20210, Loktanella vestfoldensis,    Sphingomonas sanxanigenens NX02, Epilithonimonas tenax DSM 16811,    Sporocytophaga myxococcoides and Psychroflexus torquis ATCC 700755.

-   13. The method of embodiment 9, wherein the guide RNA comprises a    single molecule of a chimeric non-naturally occurring crRNA linked    to a tracrRNA.

-   14. The method of embodiment 9, wherein the guide RNA comprises a    duplex molecule of a chimeric non-naturally occurring crRNA and a    tracrRNA.

-   15. The method of embodiment 9, wherein the chimeric non-naturally    occurring crRNA comprises a variable targeting domain capable of    hybridizing to a target sequence in the genome of an organism,    wherein said crRNA is linked a tracrRNA originating from organism    selected from the group consisting of Brevibacillus laterosporus,    Lactobacillus reuteri MIc3, Lactobacillus rossiae DSM 15814,    Pediococcus pentosaceus SL4, Lactobacillus nodensis JCM 14932,    Sulfurospirillum sp. SCADC, Bifidobacterium thermophilum DSM 20210,    Loktanella vestfoldensis, Sphingomonas sanxanigenens NX02,    Epilithonimonas tenax DSM 16811, Sporocytophaga myxococcoides and    Psychroflexus torquis ATCC 700755.

-   16. The method of embodiment 9, wherein the chimeric non-naturally    occurring crRNA comprises a variable targeting domain capable of    hybridizing to a target sequence in the genome of an organism,    wherein said crRNA can form a duplex with a tracrRNA originating    from an organism selected from the group consisting of Brevibacillus    laterosporus, Lactobacillus reuteri MIc3, Lactobacillus rossiae DSM    15814, Pediococcus pentosaceus SL4, Lactobacillus nodensis JCM    14932, Sulfurospirillum sp. SCADC, Bifidobacterium thermophilum DSM    20210, Loktanella vestfoldensis, Sphingomonas sanxanigenens NX02,    Epilithonimonas tenax DSM 16811, Sporocytophaga myxococcoides and    Psychroflexus torquis ATCC 700755.

-   17. The method of embodiment 9, wherein the chimeric non-naturally    occurring crRNA comprises at least a fragment of a crRNA originating    from an organism selected from the group consisting of Brevibacillus    laterosporus, Lactobacillus reuteri MIc3, Lactobacillus rossiae DSM    15814, Pediococcus pentosaceus SL4, Lactobacillus nodensis JCM    14932, Sulfurospirillum sp. SCADC, Bifidobacterium thermophilum DSM    20210, Loktanella vestfoldensis, Sphingomonas sanxanigenens NX02,    Epilithonimonas tenax DSM 16811, Sporocytophaga myxococcoides and    Psychroflexus torquis ATCC 700755.

-   18. A recombinant construct comprising at least one of the    Protospacer-Adjacent-Motif (PAM) sequence identified by the method    of embodiment 9.

-   19. A method for identification of a tracrRNA of an organism, the    method comprising:    -   a) providing a first single guide RNA candidate comprising a        chimeric non-naturally occurring crRNA comprising a variable        targeting domain capable of hybridizing to a target sequence in        the genome of a cell, linked to a first nucleotide sequence        representing the sense expression of a candidate tracrRNA        naturally occurring in said organism;    -   b) providing a second single guide RNA candidate comprising a        chimeric non-naturally occurring crRNA comprising a variable        targeting domain capable of hybridizing to a target sequence in        the genome of said cell, linked to a second nucleotide sequence        representing the sense expression of a candidate tracrRNA        naturally occurring in said organism;    -   c) providing to the first and second single guide RNA candidates        a Cas endonuclease protein, wherein said Cas endonuclease        protein can form a complex with either the first single guide        RNA candidate or the second single guide RNA candidate, wherein        said complex is capable of introducing a double strand break        into said target sequence; and,    -   d) identification of the first or second guide RNA candidate and        its tracrRNA component that complexes to the Cas endonuclease        of (c) and results in cleavage of the target sequence in the        genome of said cell.

-   20. A method for identification of a tracrRNA of an organism, the    method comprising:    -   a) identifying a CRISPR array repeat sequence in a genomic locus        of said organism;    -   b) aligning the CRISPR array repeat sequence of (a) with the        sequence of the genomic locus of (a) and identifying an        antirepeat sequence that encodes a tracrRNA; and,    -   c) determining the transcriptional direction of the tracrRNA.

-   21. A guide RNA capable of forming a guide RNA/Cas endonuclease    complex, wherein said guide RNA/Cas endonuclease complex can    recognize, bind to, and optionally nick or cleave a target sequence,    wherein said guide RNA is a duplex molecule comprising a chimeric    non-naturally occurring crRNA and a tracrRNA, wherein said chimeric    non-naturally occurring crRNA comprises a variable targeting domain    capable of hybridizing to said target sequence, wherein said    tracrRNA is originated from an organism selected from the group    consisting of Brevibacillus laterosporus, Lactobacillus reuteri    MIc3, Lactobacillus rossiae DSM 15814, Pediococcus pentosaceus SL4,    Lactobacillus nodensis JCM 14932, Sulfurospirillum sp. SCADC,    Bifidobacterium thermophilum DSM 20210, Loktanella vestfoldensis,    Sphingomonas sanxanigenens NX02, Epilithonimonas tenax DSM 16811,    Sporocytophaga myxococcoides and Psychroflexus torquis ATCC 700755.

-   22. A guide RNA capable of forming a guide RNA/Cas endonuclease    complex, wherein said guide RNA/Cas endonuclease complex can    recognize, bind to, and optionally nick or cleave a target sequence,    wherein said guide RNA is a single molecule comprising a chimeric    non-naturally occurring crRNA linked to a tracrRNA originating from    an organism selected from the group consisting of Brevibacillus    laterosporus, Lactobacillus reuteri MIc3, Lactobacillus rossiae DSM    15814, Pediococcus pentosaceus SL4, Lactobacillus nodensis JCM    14932, Sulfurospirillum sp. SCADC, Bifidobacterium thermophilum DSM    20210, Loktanella vestfoldensis, Sphingomonas sanxanigenens NX02,    Epilithonimonas tenax DSM 16811, Sporocytophaga myxococcoides and    Psychroflexus torquis ATCC 700755, wherein said chimeric    non-naturally occurring crRNA comprises a variable targeting domain    capable of hybridizing to said target sequence.

-   23. A guide RNA capable of forming a guide RNA/Cas endonuclease    complex, wherein said guide RNA/Cas endonuclease complex can    recognize, bind to, and optionally nick or cleave a target sequence,    wherein said guide RNA is a duplex molecule comprising a chimeric    non-naturally occurring crRNA and a tracrRNA, wherein said chimeric    non-naturally occurring crRNA comprises at least a fragment of a    crRNA originating from an organism selected from the group    consisting of Brevibacillus laterosporus, Lactobacillus reuteri    MIc3, Lactobacillus rossiae DSM 15814, Pediococcus pentosaceus SL4,    Lactobacillus nodensis JCM 14932, Sulfurospirillum sp. SCADC,    Bifidobacterium thermophilum DSM 20210, Loktanella vestfoldensis,    Sphingomonas sanxanigenens NX02, Epilithonimonas tenax DSM 16811,    Sporocytophaga myxococcoides and Psychroflexus torquis ATCC 700755,    wherein said chimeric non-naturally occurring crRNA comprises a    variable targeting domain capable of hybridizing to said target    sequence.

-   24. A guide RNA capable of forming a guide RNA/Cas endonuclease    complex, wherein said guide RNA/Cas endonuclease complex can    recognize, bind to, and optionally nick or cleave a target sequence,    wherein said guide RNA is a single molecule comprising a tracrRNA    linked to a chimeric non-naturally occurring crRNA comprising at    least a fragment of a crRNA originating from an organism selected    from the group consisting of Brevibacillus laterosporus,    Lactobacillus reuteri MIc3, Lactobacillus rossiae DSM 15814,    Pediococcus pentosaceus SL4, Lactobacillus nodensis JCM 14932,    Sulfurospirillum sp. SCADC, Bifidobacterium thermophilum DSM 20210,    Loktanella vestfoldensis, Sphingomonas sanxanigenens NX02,    Epilithonimonas tenax DSM 16811, Sporocytophaga myxococcoides and    Psychroflexus torquis ATCC 700755, wherein said chimeric    non-naturally occurring crRNA comprises a variable targeting domain    capable of hybridizing to said target sequence.

-   25. A guide RNA/Cas endonuclease complex comprising a Cas9    endonuclease originating from an organism selected from the group    consisting of Brevibacillus laterosporus, Lactobacillus reuteri    MIc3, Lactobacillus rossiae DSM 15814, Pediococcus pentosaceus SL4,    Lactobacillus nodensis JCM 14932, Sulfurospirillum sp. SCADC,    Bifidobacterium thermophilum DSM 20210, Loktanella vestfoldensis,    Sphingomonas sanxanigenens NX02, Epilithonimonas tenax DSM 16811,    Sporocytophaga myxococcoides and Psychroflexus torquis ATCC 700755,    and at least one guide RNA, wherein said guide RNA/Cas9 endonuclease    complex is capable of recognizing, binding to, and optionally    nicking or cleaving all or part of a target sequence.

-   26. The guide RNA/Cas endonuclease complex of embodiment 25    comprising at least one guide RNA of any one of embodiments 21-24.

-   27. The guide RNA/Cas endonuclease complex of embodiment 25, wherein    said target sequence is located in the genome of a cell.

-   28. The guide RNA/Cas endonuclease complex of embodiment 25, wherein    said Cas endonuclease is a Cas9 endonuclease selected from the group    consisting of SEQ ID NOs: 35 and 81-91, or a functional fragment    thereof, wherein said guide RNA/Cas9 endonuclease capable of    recognizing, binding to, and optionally nicking or cleaving all or    part of a specific DNA target sequence.

-   29. A method for modifying a target site in the genome of a cell,    the method comprising providing to said cell at least one Cas9    endonuclease originating from an organism selected from the group    consisting of Brevibacillus laterosporus, Lactobacillus reuteri    MIc3, Lactobacillus rossiae DSM 15814, Pediococcus pentosaceus SL4,    Lactobacillus nodensis JCM 14932, Sulfurospirillum sp. SCADC,    Bifidobacterium thermophilum DSM 20210, Loktanella vestfoldensis,    Sphingomonas sanxanigenens NX02, Epilithonimonas tenax DSM 16811,    Sporocytophaga myxococcoides and Psychroflexus torquis ATCC 700755,    and at least one guide RNA, wherein said guide RNA and Cas    endonuclease can form a complex that is capable of recognizing,    binding to, and optionally nicking or cleaving all or part of said    target site.

-   30. The method of embodiment 29, further comprising identifying at    least one cell that has a modification at said target, wherein the    modification at said target site is selected from the group    consisting of (i) a replacement of at least one nucleotide, (ii) a    deletion of at least one nucleotide, (iii) an insertion of at least    one nucleotide, and (iv) any combination of (i)-(iii).

-   31. A method for editing a nucleotide sequence in the genome of a    cell, the method comprising providing to said cell at least one Cas9    endonuclease originating from an organism selected from the group    consisting of Brevibacillus laterosporus, Lactobacillus reuteri    MIc3, Lactobacillus rossiae DSM 15814, Pediococcus pentosaceus SL4,    Lactobacillus nodensis JCM 14932, Sulfurospirillum sp. SCADC,    Bifidobacterium thermophilum DSM 20210, Loktanella vestfoldensis,    Sphingomonas sanxanigenens NX02, Epilithonimonas tenax DSM 16811,    Sporocytophaga myxococcoides and Psychroflexus torquis ATCC 700755,    a polynucleotide modification template, and at least one guide RNA,    wherein said polynucleotide modification template comprises at least    one nucleotide modification of said nucleotide sequence, wherein    said guide RNA and Cas endonuclease can form a complex that is    capable of recognizing, binding to, and optionally nicking or    cleaving all or part of said target site.

-   32. A method for modifying a target site in the genome of a cell,    the method comprising providing to said cell at least one guide RNA,    at least one donor DNA, and at least one Cas9 endonuclease    originating from an organism selected from the group consisting of    Brevibacillus laterosporus, Lactobacillus reuteri MIc3,    Lactobacillus rossiae DSM 15814, Pediococcus pentosaceus SL4,    Lactobacillus nodensis JCM 14932, Sulfurospirillum sp. SCADC,    Bifidobacterium thermophilum DSM 20210, Loktanella vestfoldensis,    Sphingomonas sanxanigenens NX02, Epilithonimonas tenax DSM 16811,    Sporocytophaga myxococcoides and Psychroflexus torquis ATCC 700755,    wherein said at least one guide RNA and at least one Cas    endonuclease can form a complex that is capable of recognizing,    binding to, and optionally nicking or cleaving all or part of said    target site, wherein said donor DNA comprises a polynucleotide of    interest.

-   33. The method of embodiment 32, further comprising identifying at    least one cell that said polynucleotide of interest integrated in or    near said target site.

-   34. The method of any one of embodiments 29-33, wherein the cell is    selected from the group consisting of a human, non-human, animal,    bacterial, fungal, insect, yeast, non-conventional yeast, and plant    cell.

-   35. The method of embodiment 34, wherein the plant cell is selected    from the group consisting of a monocot and dicot cell.

-   36. The method of embodiment 35, wherein the plant cell is selected    from the group consisting of maize, rice, sorghum, rye, barley,    wheat, millet, oats, sugarcane, turfgrass, or switchgrass, soybean,    canola, alfalfa, sunflower, cotton, tobacco, peanut, potato,    tobacco, Arabidopsis, and safflower cell.

-   37. A plant comprising a modified target site, wherein said plant    originates from a plant cell comprising a modified target site    produced by the method of any of embodiments 29-36.

-   38. A plant comprising an edited nucleotide, wherein said plant    originates from a plant cell comprising an edited nucleotide    produced by the method of embodiment 31.

-   39. A method for designing a single guide RNA, the method    comprising:    -   a) aligning a tracrRNA sequence with a CRISPR array repeat        sequence from a genomic locus of an organism, wherein said        CRISPR array repeat sequence comprises a crRNA sequence;    -   b) deducing the transcriptional direction of the CRISPR array,        thereby also deducing the crRNA sequence; and,    -   c) designing a single guide RNA comprising said tracrRNA and        crRNA sequences

-   40. A method for producing target sequences, the method comprising:    -   a) identifying a polynucleotides of interest;    -   b) introducing a Protospacer-Adjacent-Motif (PAM) sequence        adjacent to said polynucleotide of interest, wherein said PAM        sequence comprises the nucleotide sequence NNNNCND, thereby        creating a thereby creating a target site for a guide RNA/Cas9        endonuclease complex; and,    -   c) identifying a polynucleotides of interest;

-   41. The method for embodiment 40, wherein the guide RNA/Cas9    endonuclease complex, comprises at least one Cas9 endonuclease    originated from organism selected from the group consisting of    Brevibacillus laterosporus, Lactobacillus reuteri MIc3,    Lactobacillus rossiae DSM 15814, Pediococcus pentosaceus SL4,    Lactobacillus nodensis JCM 14932, Sulfurospirillum sp. SCADC,    Bifidobacterium thermophilum DSM 20210, Loktanella vestfoldensis,    Sphingomonas sanxanigenens NX02, Epilithonimonas tenax DSM 16811,    Sporocytophaga myxococcoides and Psychroflexus torquis ATCC 700755,    wherein said guide RNA/Cas9 endonuclease complex is capable of    recognizing, binding to, and optionally nicking or cleaving all or    part of a target sequence

-   42. A method for producing a plasmid DNA library containing a    randomized Protospacer-Adjacent-Motif (PAM) sequence, the method    comprising transforming at least one host cell with a ligation    product and recovering multiple host cell colonies representing the    plasmid library, wherein said ligation product was generated by    contacting a library of linear oligoduplexes with a linearized    plasmid, wherein each oligoduplex member of said library of    oligoduplexes comprises a first single stranded oligonucleotide    comprising a-target sequence, and a second single stranded    oligonucleotide comprising a randomized PAM sequence adjacent to a    nucleotide sequence capable of hybridizing with said target    sequence.

-   43. A method for identification of a Protospacer-Adjacent-Motif    (PAM), the method comprising:    -   a) providing a library of plasmids, wherein each one of said        plasmids comprise a randomized Protospacer-Adjacent-Motif        sequence integrated adjacent to a target sequence that can be        recognized by a guide RNA/Cas endonuclease complex;    -   b) producing a 3 prime (3′) or 5 prime (5′) overhang into the        target sequence of (a) by providing to the plasmids of (a) a 3        prime deoxy-adenine, a guide RNA and a Cas endonuclease protein,        wherein said guide RNA and Cas endonuclease can form a complex        that is capable of introducing a double strand break into said        target sequence;    -   c) ligating adapters to the 3 prime or 5 prime overhang of (c),        thereby creating a library of cleaved targets that can be        amplified;    -   d) amplifying the library of cleaved targets such that cleaved        products containing the randomized PAM sequence are enriched;    -   e) sequencing the library of (a) and the library of enriched        PAM-sided targets of (d) and identifying the nucleotide sequence        adjacent to the cleaved targets of (b) on either strand of the        plasmid DNA, wherein said nucleotide sequence represents a        putative Protospacer-Adjacent-Motif sequences; and,    -   f) determining the fold enrichment of each nucleotide within the        putative Protospacer-Adjacent-Motif sequence relative to the        plasmid DNA library of (a).

-   44. A single guide RNA selected from the group consisting of SEQ ID    NOs: 47, 127, 114-125, and 128-139.

-   45. A single guide RNA capable of forming a guide RNA/Cas9    endonuclease complex, wherein said guide RNA/Cas9 endonuclease    complex can recognize, bind to, and optionally nick or cleave a    target sequence, wherein said single guide RNA is selected from the    group consisting of SEQ ID NOs: 128, 129, 130, 131, 132, 133, 134,    135, 136, 137, 138 and 139.

-   46. A single guide RNA capable of forming a guide RNA/Cas9    endonuclease complex, wherein said guide RNA/Cas9 endonuclease    complex can recognize, bind to, and optionally nick or cleave a    target sequence, wherein said single guide RNA comprises a chimeric    non-naturally occurring crRNA linked to a tracrRNA, wherein said    tracrRNA comprises a nucleotide sequence selected from the group    consisting of SEQ ID NOs: 173, 174, 175, 176, 177, 178, 179, 180,    181, 182, 183 and 184.

-   47. A single guide RNA capable of forming a guide RNA/Cas9    endonuclease complex, wherein said guide RNA/Cas9 endonuclease    complex can recognize, bind to, and optionally nick or cleave a    target sequence, wherein said single guide RNA comprises a chimeric    non-naturally occurring crRNA linked to a tracrRNA, wherein said    chimeric non-naturally occurring crRNA comprises a nucleotide    sequence selected from the group consisting of SEQ ID NOs: 149, 150,    151, 152, 153, 154, 155, 156, 157, 158, 159 and 160.

-   48. A guide RNA capable of forming a guide RNA/Cas9 endonuclease    complex, wherein said guide RNA/Cas9 endonuclease complex can    recognize, bind to, and optionally nick or cleave a target sequence,    wherein said guide RNA is a duplex molecule comprising a chimeric    non-naturally occurring crRNA and a tracrRNA, wherein said chimeric    non-naturally occurring crRNA comprises a variable targeting domain    capable of hybridizing to said target sequence, wherein said    tracrRNA comprises a nucleotide sequence selected from the group    consisting of SEQ ID NOs: 173, 174, 175, 176, 177, 178, 179, 180,    181, 182, 183 and 184, wherein said chimeric non-naturally occurring    crRNA comprises a variable targeting domain capable of hybridizing    to said target sequence.

-   49. A guide RNA capable of forming a guide RNA/Cas9 endonuclease    complex, wherein said guide RNA/Cas9 endonuclease complex can    recognize, bind to, and optionally nick or cleave a target sequence,    wherein said guide RNA is a duplex molecule comprising a chimeric    non-naturally occurring crRNA and a tracrRNA, wherein said chimeric    non-naturally occurring crRNA comprises a nucleotide sequence    selected from the group consisting of SEQ ID NOs: 149, 150, 151,    152, 153, 154, 155, 156, 157, 158, 159 and 160, wherein said    chimeric non-naturally occurring crRNA comprises a variable    targeting domain capable of hybridizing to said target sequence.

-   50. A guide RNA capable of forming a guide RNA/Cas9 endonuclease    complex, wherein said guide RNA/Cas9 endonuclease complex can    recognize, bind to, and optionally nick or cleave a target sequence,    wherein said guide RNA is a duplex molecule comprising a chimeric    non-naturally occurring crRNA and a tracrRNA, wherein said tracrRNA    comprises a nucleotide sequence selected from the group consisting    of SEQ ID NOs: 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183    and 184, wherein said chimeric non-naturally occurring crRNA    comprises a nucleotide sequence selected from the group consisting    of SEQ ID NOs: 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159    and 160, wherein said chimeric non-naturally occurring crRNA    comprises a variable targeting domain capable of hybridizing to said    target sequence.

-   51. A guide RNA/Cas9 endonuclease complex comprising a Cas9    endonuclease selected from the group consisting of SEQ ID NOs: 81,    82, 83, 84, 85, 86, 87, 88, 89, 90 and 91, or a functional fragment    thereof, and at least one guide RNA, wherein said guide RNA/Cas9    endonuclease complex is capable of recognizing, binding to, and    optionally nicking or cleaving all or part of a target sequence.

-   52. A guide RNA/Cas9 endonuclease complex comprising at least one    guide RNA and a Cas9 endonuclease, wherein said Cas9 endonuclease is    encoded by a DNA sequence selected from the group consisting of SEQ    ID NOs: 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, and 80, wherein said    guide RNA/Cas9 endonuclease complex is capable of recognizing,    binding to, and optionally nicking or cleaving all or part of a    target sequence.

-   53. The guide RNA/Cas9 endonuclease complex of embodiment 7, wherein    said guide RNA is selected from the group consisting of SEQ ID NOs:    128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138 and 139.

-   54. The guide RNA/Cas9 endonuclease complex of embodiments 7,    wherein said target sequence is located in the genome of a cell.

-   55. A method for modifying a target site in the genome of a cell,    the method comprising providing to said cell at least one Cas9    endonuclease selected from the group consisting of SEQ ID NOs: 81,    82, 83, 84, 85, 86, 87, 88, 89, 90 and 91, or a functional fragment    thereof, and at least one guide RNA, wherein said guide RNA and Cas9    endonuclease can form a complex that is capable of recognizing,    binding to, and optionally nicking or cleaving all or part of said    target site.

-   56. The method of embodiment 10, further comprising identifying at    least one cell that has a modification at said target, wherein the    modification at said target site is selected from the group    consisting of (i) a replacement of at least one nucleotide, (ii) a    deletion of at least one nucleotide, (iii) an insertion of at least    one nucleotide, and (iv) any combination of (i)-(iii).

-   57. A method for editing a nucleotide sequence in the genome of a    cell, the method comprising providing to said cell at least one Cas9    endonuclease selected from the group consisting of SEQ ID NOs: 81,    82, 83, 84, 85, 86, 87, 88, 89, 90 and 91, or a functional fragment    thereof, a polynucleotide modification template, and at least one    guide RNA, wherein said polynucleotide modification template    comprises at least one nucleotide modification of said nucleotide    sequence, wherein said guide RNA and Cas9 endonuclease can form a    complex that is capable of recognizing, binding to, and optionally    nicking or cleaving all or part of said target site.

-   58. A method for modifying a target site in the genome of a cell,    the method comprising providing to said cell at least one guide RNA,    at least one donor DNA, and at least one Cas9 endonuclease selected    from the group consisting of SEQ ID NOs: 81, 82, 83, 84, 85, 86, 87,    88, 89, 90 and 91, or a functional fragment thereof, wherein said at    least one guide RNA and at least one Cas9 endonuclease can form a    complex that is capable of recognizing, binding to, and optionally    nicking or cleaving all or part of said target site, wherein said    donor DNA comprises a polynucleotide of interest.

-   59. The method of embodiments 11, 13 or 14, wherein said guide RNA    is selected from the group consisting of SEQ ID NOs: 128, 129, 130,    131, 132, 133, 134, 135, 136, 137, 138 and 139.

-   60. The method of embodiment 13, further comprising identifying at    least one cell that said polynucleotide of interest integrated in or    near said target site.

-   61. The method of any one of embodiments 10-14, wherein the cell is    selected from the group consisting of a human, non-human, animal,    bacterial, fungal, insect, yeast, non-conventional yeast, and plant    cell.

-   62. A single guide RNA capable of forming a guide RNA/Cas9    endonuclease complex, wherein said guide RNA/Cas9 endonuclease    complex can recognize, bind to, and optionally nick or cleave a    target sequence, wherein said single guide RNA comprises a chimeric    non-naturally occurring crRNA linked to a tracrRNA, wherein said    tracrRNA comprises a nucleotide sequence selected from the group    consisting of SEQ ID NOs: 173, 174, 175, 176, 177, 178, 179, 180,    181, 182, 183 and 184, wherein said chimeric non-naturally occurring    crRNA comprises a nucleotide sequence selected from the group    consisting of SEQ ID NOs: 149, 150, 151, 152, 153, 154, 155, 156,    157, 158, 159 and 160.

-   63. A kit for binding, cleaving or nicking a target sequence in    eukaryotic cells or organisms comprising a guide RNA specific for    said target DNA, and a Cas endonuclease protein selected from the    group consisting of SEQ ID NOs: 81, 82, 83, 84, 85, 86, 87, 88, 89,    90 and 91.

-   64. A kit for cleaving a target sequence in eukaryotic cells or    organisms comprising a guide RNA specific for said target DNA, and a    Cas endonuclease protein, wherein said guide RNA is capable of    forming a guide RNA/Cas9 endonuclease complex, wherein said guide    RNA/Cas9 endonuclease complex can recognize, bind to, and optionally    nick or cleave said target sequence, wherein said guide RNA is    selected from the group consisting of 128, 129, 130, 131, 132, 133,    134, 135, 136, 137, 138 and 139.

-   65. A kit for targeted mutagenesis in eukaryotic cells or organisms    comprising a guide RNA specific for said target DNA, a    polynucleotide modification template, and a Cas endonuclease    protein, wherein said guide RNA is capable of forming a guide    RNA/Cas9 endonuclease complex, wherein said guide RNA/Cas9    endonuclease complex can recognize, bind to, and optionally nick or    cleave said target sequence, wherein said guide RNA is selected from    the group consisting of SEQ ID NOs: 128, 129, 130, 131, 132, 133,    134, 135, 136, 137, 138 and 139, wherein said Cas endonuclease    protein is selected from the group consisting of SEQ ID NOs: 81, 82,    83, 84, 85, 86, 87, 88, 89, 90 and 91.

-   66. The kit of any one of embodiments 63-65, further comprising a    molecule selected from the group consisting of an inhibitors of    NHEJ, an activator of HDR or MMEJ repair pathways, an exogenous    sequence, a homologous recombination DNA, a donor DNA, and any one    combination thereof.

EXAMPLES

In the following Examples, unless otherwise stated, parts andpercentages are by weight and degrees are Celsius. It should beunderstood that these Examples, while indicating embodiments of thedisclosure, are given by way of illustration only. From the abovediscussion and these Examples, one skilled in the art can make variouschanges and modifications of the disclosure to adapt it to varioususages and conditions. Such modifications are also intended to fallwithin the scope of the appended claims.

Example 1 Design and Construction of 5N RandomizedProtospacer-Adjacent-Motif (PAM) Library for Assaying Cas9 PAMPreferences

To characterize the Protospacer-Adjacent-Motif (PAM) specificity of Cas9proteins from Type II CRISPR (clustered, regularly interspaced, shortpalindromic repeats)-Cas (CRISPR-associated) nucleic acid-based adaptiveimmune systems found in most archaea and some bacteria, a plasmid DNAlibrary containing a section of 5 random base pairs immediately adjacentto a 20 base pair target sequence, T1 (CGCTAAAGAGGAAGAGGACA (SEQ ID NO:1), was developed. Randomization of the PAM sequence was generatedthrough the synthesis of a single oligonucleotide, GG-821N(TGACCATGATTACGAATTCNNNNNTGTCCTCTTCCTCTTTAGCGAGC (SEQ ID NO: 2), withhand-mixing used to create a random incorporation of nucleotides acrossthe 5 random residues (represented as N in the sequence of GG-821N). Toconvert the single stranded template of GG-821N into a double-strandedDNA template for cloning into the plasmid vector, a secondoligonucleotide, GG-820 (AAGGATCCCCGGGTACCGAGCTGCTCGCTAAAGAGGAAGAGGAC(SEQ ID NO: 3), was synthesized with complementation to the 3′ end ofGG-821N to form a partial oligonucleotide duplex (oligoduplex I) asdepicted in FIG. 1. The partial duplex was then extended by PCR usingDreamTaq polymerase (Thermo Fisher Scientific) to generate a full duplexcontaining the target sequence, 5 NNNNN randomized base pairs downstreamof the target sequence and cleavage site for the BamHI restrictionenzyme (oligoduplex II in FIG. 1). To generate the plasmid library, theoligoduplex, purified using GeneJET PCR Purification Kit (Thermo FisherScientific), was digested with BamHI and ligated into pTZ57R/T vector(Thermo Fisher Scientific) pre-cleaved with BamHI. Linear pTZ57R/Tvector contains protruding ddT nucleotide at the 3′ ends, whereas PCRfragments generated with DreamTaq polymerase contains dA at the 3′ ends.Therefore one end of the PCR fragment is ligated into the vector throughBamHI sticky ends, while another through NT ends (FIG. 2). The E. coliDH5a strain was transformed (Ca²⁺ transformation) with the ligatedplasmid library and plated onto Luria Broth (LB) containing agar. Thetransformation efficiency was estimated from plated dilutions. Overall,˜12,000 colonies were recovered. The colonies were harvested from theplate by gently resuspending them in liquid LB media and plasmid DNA waspurified using GeneJET Plasmid Miniprep kit (Thermo Fisher Scientific).

To validate the randomness of the resulting PAM library, PCR fragmentsspanning the 5 bp randomized PAM region were generated by PhusionHigh-Fidelity DNA Polymerase (Thermo Fisher Scientific) amplification(15 cycles of a 2-step amplification protocol) using a TK-119(GAGCTCGCTAAAGAGGAAGAGG (SEQ ID NO: 4) and pUC-dir(GCCAGGGTTTTCCCAGTCACGA (SEQ ID NO: 5) primer pair and 50 ng of plasmidDNA library as template. The resulting 122 bp PCR product was purifiedusing GeneJET PCR Purification Kit (Thermo Fisher Scientific). 40 ng ofthe resulting PCR product was then amplified with Phusion® High FidelityPCR Master Mix (New England Biolabs, M0531L) adding on the sequencesnecessary for amplicon-specific barcodes and Illumnia sequencing using“tailed” primers through two rounds of PCR each consisting of 10 cycles.The primers used in the primary PCR reaction are shown in Table 2 and aset of primers (AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACG (UniversalForward, SEQ ID NO: 8) and CAAGCAGAAGACGGCATA (Universal Reverse, SEQ IDNO: 9) universal to all primary PCR reactions were utilized for thesecondary PCR amplification. The resulting PCR amplifications werepurified with a Qiagen PCR purification spin column, concentrationmeasured with a Hoechst dye-based fluorometric assay, combined in anequimolar ratio, and single read 60-100 nucleotide-length deepsequencing was performed on IIlumina's MiSeq Personal Sequencer with a5-10% (v/v) spike of PhiX control v3 (Illumina, FC-110-3001) to off-setsequence bias. The PAM sequence for only those reads containing aperfect 12 nt sequence match flanking either side of the 5 nucleotiderandomized PAM sequence were captured and used to examine the frequencyand diversity of PAM sequences present in the library. The frequency ofeach PAM sequence was calculated by dividing the number of reads with agiven PAM by the total number of reads. The PAM sequence distributionwas visualized by ordering the frequency of each PAM from greatest toleast and displaying them graphically and by calculating the standarddeviation of the resulting PAM frequencies relative to the average. Asshown in FIG. 4, all 1,024 possible PAM sequences were present at anaverage frequency of 0.10% with a coefficient of variation of 40.86%.

TABLE 2 Primary PCR primer sequences for tailing on the sequencesneeded for Illumina deep sequencing of initialuncut 5 bp randomized PAM pTZ57R/T library. Primer Primer SEQ NameOrientation Primary PCR Primer Sequence ID NO. JKYS800.1 ForwardCTACACTCTTTCCCTACACGACGCTCTTCCGATCT 6 AAGTGAGCTCGCTAAAGAGGAAGA JKYS803Reverse CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTG 7 AATTCGAGCTCGGTACCT

Example 2 Protein Expression and Purification of Streptococcus pyogenes,Streptococcus thermophilus CRISPR1 and Streptococcus thermophilusCRISPR3 Cas9 Proteins

To examine the PAM specificity of the Cas9 proteins from theStreptococcus pyogenes (Spy) (Jinek et al. (2012) Science 337:816-21),Streptococcus thermophilus CRISPR1 (Sth1) (Horvath et al. (2008) Journalof Bacteriology 190:1401-12) and Streptococcus thermophilus CRISPR3(Sth3) (Horvath et al. (2008) Journal of Bacteriology 190:1401-12) TypeII CRISPR-Cas systems, Spy, Sth1 and Sth3 Cas9 proteins were E. coliexpressed and purified. Briefly, the cas9 genes of the CRISPR1-Cas andCRISPR3-Cas systems of Streptococcus thermophilus (Sth1 and Sth3) wereamplified from a genomic DNA sample, while the cas9 gene ofStreptococcus pyogenes (Spy) was amplified from a plasmid, pMJ806(Addgene plasmid #39312)). DNA fragments encoding Sth1, Sth3 and SpyCas9 were PCR amplified using Sth1-dir/Sth1-rev(ACGTCTCACATGACTAAGCCATACTCAATTGGAC (SEQ ID NO: 10);ACTCGAGACCCTCTCCTAGTTTGGCAA (SEQ ID NO: 11), Sth3-dir/Sth3-rev(GGGGGGTCTCACATGAGTGACTTAGT (SEQ ID NO: 12);AATTACTCGAGAAAATCTAGCTTAGGCTTA (SEQ ID NO: 13) and Spy-dir/Spy-rev(AAGGTCTCCCATGGATAAGAAATACTCAATAGGCTTAG (SEQ ID NO: 14);TTCTCGAGGTCACCTCCTAGCTGACTCAAATC (SEQ ID NO: 15) primer pairs,accordingly, and ligated into a pBAD24-CHis expression vector digestedover NcoI and XhoI sites.

Sth3 and Spy Cas9 proteins were expressed in E. coli DH10HB strain grownin LB broth supplemented with ampicillin (100 mg/ml). Cells were grownat 37° C. to an OD 600 of 0.5 at which time the growth temperature wasdecreased to 16° C. and expression induced with 0.2% (w/v) arabinose for20 h. Cells were pelleted and resuspended in loading buffer (20 mMKH₂PO₄ pH7.0, 0.5 M NaCl, 10 mM imidazole, 5% glycerol) and disrupted bysonication. Cell debris was removed by centrifugation. The supernatantwas loaded onto the Ni²⁺-charged 5 ml HiTrap chelating HP column (GEHealthcare) and eluted with a linear gradient of increasing imidazoleconcentration. The fractions containing Cas9 were pooled andsubsequently loaded onto HiTrap heparin HP column (GE Healthcare) forelution using a linear gradient of increasing NaCl concentration (from0.5 to 1 M NaCl). The fractions containing Cas9 were pooled and dialyzedagainst 10 mM Bis-Tris-HCl pH 7.0, 300 mM KCl, 1 mM EDTA, 1 mM DTT, 50%(v/v) glycerol and stored at −20° C.

Example 3 Identification of PAM Preferences for Streptococcus pyogenesand Streptococcus thermophilus CRISPR3 Cas9 Proteins

To empirically examine the PAM preferences for Streptococcus pyogenes(Spy) and Streptococcus thermophilus CRISPR3 (Sth3) Cas9 proteins, therandomized PAM library described in Example 1 was subject to digestionwith purified Sth3 and Spy Cas9 proteins and guide RNA containing avariable targeting domain that hybridizes with, i.e., is complementaryto, a sequence in the target DNA molecule (referred herein as targetsequence), T1 (SEQ ID NO: 1). Sth3 and Spy Cas9-crRNA-tracrRNA complexeswere assembled by mixing Cas9 protein with pre-annealed crRNA andtracrRNA duplex (Table 3) at 1:1 molar ratio followed by incubation in acomplex assembly buffer (10 mM Tris-HCl pH 7.5 at 37° C., 100 mM NaCl, 1mM EDTA, 1 mM DTT) at 37° C. for 1 h. 1 μg of plasmid DNA library withrandomized 5 bp NNNNN PAM was cleaved with 50 nM and 100 nM of Cas9complex in a reaction buffer (10 mM Tris-HCl pH 7.5 at 37° C., 100 mMNaCl, 10 mM MgCl₂, 1 mM DTT) for 60 min. at 37° C. in a 100 μl reactionvolume (FIG. 3).

TABLE 3 RNA molecules used for Sth3 and SpyCas9-crRNA-tracrRNA complex assembly. Name Sequence (5′-3′) OriginSEQ ID NO. Sth3 crRNA CGCUAAAGAGGAAGAGGACAGUUUUAGAGC Synthetic 16UGUGUUGUUUCG oligonucleotide Sth3 GGGCGAAACAACACAGCGAGUUAAAAUAAGIn vitro 17 tracrRNA GCUUAGUCCGUACUCAACUUGAAAAGGUGG transcriptionCACCGAUUCGGUGUUUUU Spy crRNA CGCUAAAGAGGAAGAGGACAGUUUUAGAGC Synthetic 18UAUGCUGUUUUG oligonucleotide Spy GGGAAACAGCAUAGCAAGUUAAAAUAAGGC In vitro19 tracrRNA UAGUCCGUUAUCAACUUGAAAAAGUGGCAC transcriptionCGAGUCGGUGCUUUUUUU

To efficiently capture the blunt-ends of the plasmid library generatedby Sth3 or Spy cleavage, a 3′ dA was added by incubating the completeddigestion reactions with 2.5 U of DreamTaq DNA Polymerase (Thermo FisherScientific) and 0.5 μl of 10 mM dATP (or dNTP) for an additional 30 min.at 72° C. (FIG. 3). Reaction products were purified using GeneJET PCRPurification Kit (Thermo Fisher Scientific). Next adapters with a 3′ dToverhang were generated by annealing TK-117(CGGCATTCCTGCTGAACCGCTCTTCCGATCT (SEQ ID NO: 20) and phosphorylatedTK-111 (GATCGGAAGAGCGGTTCAGCAGGAATGCCG (SEQ ID NO: 21) oligonucleotides.100 ng of the resulting adapter was ligated to an equal concentration ofthe purified 3′ dA overhanging cleavage products for 1 hour at 22° C. ina 25 μl reaction volume in ligation buffer (40 mM Tris-HCl pH 7.8 at 25°C., 10 mM MgCl₂, 10 mM DTT, 0.5 mM ATP, 5% (w/v) PEG 4000, 0.5 U T4Ligase; Thermo Fisher Scientific) (FIG. 3). Next, to selectively enrichfor cleaved products containing the PAM sequence, PCR amplification wasperformed with a forward primer, pUC-dir (SEQ ID NO: 5), specific to thePAM-side of the cleaved pTZ57R/T plasmid vector and with a reverseprimer, TK-117 (SEQ ID NO: 20), specific to the ligated TK-117/TK-111adapter sequence (FIG. 3). PCR fragments were generated by PhusionHigh-Fidelity DNA Polymerase (Thermo Fisher Scientific) amplification(15 cycles of a 2-step amplification protocol) with 10 μl of ligationreaction mixtures as a template (in 100 μl total volume). The resulting131 bp PCR products amplified from the Cas9 pre-cleaved plasmidlibraries were purified with GeneJET PCR Purification Kit (Thermo FisherScientific) and prepared for IIlumina deep sequencing as described inExample 1 except the barcode containing forward primers used in theprimary reaction were specific to the TK-117/TK-111 adapter sequence andare shown in Table 4 (FIG. 3).

TABLE 4 Primary PCR primer sequences for tailing on the sequencesneeded for Illumina deep sequencing of cleaved and adapterligated 5 bp randomized PAM pTZ57R/T library Primer Digestion Primer SEQName Experiment Orientation Primary PCR Primer Sequence ID NO. JKYS807.1 50 nM Sth3 Forward CTACACTCTTTCCCTACACGACGCTCTTCC 22GATCTAAGGCGGCATTCCTGCTGAAC JKYS807.2 100 nM Sth3 ForwardCTACACTCTTTCCCTACACGACGCTCTTCC 23 GATCTTTCCCGGCATTCCTGCTGAAC JKYS807.3 50 nM Spy Forward CTACACTCTTTCCCTACACGACGCTCTTCC 24GATCTGGAACGGCATTCCTGCTGAAC JKYS807.4 100 nM Spy ForwardCTACACTCTTTCCCTACACGACGCTCTTCC 25 GATCTCCTTCGGCATTCCTGCTGAACThe resulting IIlumina compatible libraries were then sequenced asdescribed in Example 1. The PAM sequence for only those reads containinga perfect 12 nt sequence match flanking either side of the 5 nucleotiderandomized PAM sequence were captured and used to examine the frequencyand diversity of PAM sequences present in the Sth3 and Spy Cas9-guideRNA cleaved libraries. Given the inherent bias in the uncut libraryobserved in FIG. 4 and described in Example 1, PAM preferences werecalculated relative to the uncut library by dividing the frequency of agiven PAM from the Sth3 or Spy Cas9-guide RNA digested library by thefrequency of the same PAM sequence in the uncut library with theresulting value being represented as a fold enrichment correlative tothe uncut control. To examine the PAM preferences of Sth3 and Spy Cas9proteins, the percent nucleotide composition of the PAM sequences withfold enrichment relative to the uncut control were examined. As shown inFIG. 5 and FIG. 6, the canonical PAM preferences for both Sth3 and SpyCas9 proteins, NGGNG and NGG, respectively, are observed in both the 50nM and 100 nM digests. For Sth3 Cas9 protein, a slight preference (notpreviously reported) for a C or T bp at position 1 is also evident.Next, the effect of decreasing Sth3 and Spy Cas9-crRNA-tracrRNA complexconcentration and digestion time on PAM preferences was examined. Tothis end, the minimal Cas9 concentration and shortest time where PCRamplified cleavage products may still be obtained from the randomizedPAM plasmid library were determined. First, the reaction time was heldconstant at 60 minutes while the Cas9-crRNA-tracrRNA complexconcentration was varied between 0.5-100 nM. Next, theCas9-crRNA-tracrRNA complex concentration was fixed at 50 nM and thereaction time was varied between 1-60 minutes. Optimization of thecleavage reaction conditions revealed that the concentration andcleavage time for Sth3 and Spy Cas9 complexes could be reduced to 0.5 nM(at a 60 min. incubation time) or 1 min. (at a 50 nM concentration ofCas9 complex), respectively (FIG. 7).

To examine the PAM sequences present in the minimally digested Sth3 andSpy Cas9-guide RNA libraries, 0.5 nM-60 minute and 50 nM-1 minute PCRamplified cleavage products were purified with the GeneJET PCRPurification Kit (Thermo Fisher Scientific) and subjected to IIluminadeep sequencing as described above for the 50 nM and 100 nM-60 minuteSth3 and Spy digests. As a positive control and to demonstrate thereproducibility of PAM preferences derived from our assay, the 50 nM-60minute digests for Sth3 and Spy were repeated and IIlumina deepsequenced again. PAM preference analysis was carried-out as describedabove for the Sth3 and Spy (50 nM and 100 nM-60 minute digests)examining the percent nucleotide composition of the PAM sequences withfold enrichment relative to the uncut library. As shown in FIG. 8 andFIG. 9, the positive controls (Sth3 and Spy 50 nM-60 minute digests)demonstrated very similar trends in PAM preferences compared to thatobserved previously indicating a high degree of assay reproducibility.The PAM preferences observed in the minimally Sth3 and Spy digestedlibraries compared to that exhibited by the respective 50 nM-60 minutepositive control are shown in FIG. 10 and FIG. 11. When theconcentration of Sth3 Cas9-crRNA-tracrRNA complex is lowered to 0.5 nM,the percentage of uncanonical PAM residues cleaved by Sth3 decreases;resulting in a tightening of specificity (FIG. 10). This is most evidentat positions 2 and 3 where on-nucleotide preferences for a G increaseand off-nucleotide preferences decrease. A similar shift in PAMpreference towards the reported PAM sequence for Spy (NGG) is observedwhen the Spy Cas9-crRNA-tracrRNA complex is lowered to 0.5 nM. Here thepercentage of PAMs with an uncanonical A residue at position 2 declinesfrom over 20% in the 50 nM-60 minute and 50 nM-1 minute digests toalmost zero in the 0.5 nM-60 minute digest (FIG. 11).

Next, the effect of using a chimeric fusion of crRNA and tracrRNA(single guide RNA (sgRNA)) (Jinek et al. (2012) Science 337:816-21 andGasiunas et al. (2012) Proc. Natl acad. Sci. USA 109: E2579-E2586) onSth3 and Spy Cas9 PAM preferences was assayed. Digestion, enrichment,IIlumina deep sequencing and PAM preference analysis was carried-out asdescribed above against the randomized 5 bp PAM plasmid DNA libraryexcept a sgRNA (Table 5) was used in place of the crRNA-tracrRNA duplexand digests were only performed with 0.5 nM of sgRNA-Cas9 complex for 60min.

TABLE 5 RNA molecules used for Cas9-sgRNA complex assembly. NameSequence (5′-3′) Origin SEQ ID NO. Sth3 sgRNAGGGCGCUAAAGAGGAAGAGGACAGUUUUAGAGCU In vitro 26GUGUUGUUUCGGUUAAAACAACACAGCGAGUUAA transcriptionAAUAAGGCUUAGUCCGUACUCAACUUGAAAAGGU GGCACCGAUUCGGUGUUUUUU Spy sgRNAGGGCGCUAAAGAGGAAGAGGACAGUUUUAGAGCU In vitro 27AGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUA transcriptionUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUU UUU

As shown in FIG. 12 and FIG. 13, the PAM preferences for Sth3 and SpyCas9 proteins (NGGNG and NGG respectively) are nearly identicalregardless of the type of guide RNA used; either a crRNA-tracrRNA duplexor sgRNA.

Example 4 Identification of PAM Preferences for Streptococcusthermophilus CRISPR1 Cas9 Protein

To empirically examine the PAM preferences for Streptococcusthermophilus CRISPR1 (Sth1) Cas9 protein with a reported PAM sequence of7 nucleotides, NNAGAAW (Horvath et al. (2008) Journal of Bacteriology190:1401-12), a randomized 7 bp PAM plasmid DNA library was generated asdescribed for the 5 bp randomized PAM library in Example 1 with thefollowing modifications. Randomization of the PAM sequence was generatedthrough the synthesis of four oligonucleotides, GG-940-G(GTGCACGCCGGCGACGTTGGGTCAACTNNGNNNNTGTCCTCTTCCTCTTTAG CGTTTAG (SEQ IDNO: 28), GG-940-C (GTGCACGCCGGCGACGTTGGGTCAACTNNCNNNNTGTCCTCTTCCTCTTTAGCGTTTAG (SEQ ID NO: 29), GG-940-A (GTGCACGCCGGCGACGTTGGGTCAACTN NAN N NNTGTCCTCTTCCTCTTTAG CGTTTAG (SEQ ID NO: 30) and GG-940-T(GTGCACGCCGGCGACGTTGGGTCAACTN NTN N N NTGTCCTCTTCCTCTTTAG CGTTTAG (SEQID NO: 31), with hand-mixing used to create a random incorporation ofnucleotides across the random residues (represented as N). Therandomized single stranded oligonucleotides were each separatelyconverted into double-stranded DNA templates for cloning into theplasmid vector using a second oligonucleotide, GG-939(GACTAGACCTGCAGGGGATCCCGTCGACAAATTCTAAACGCTAAAGAGGAAG AGGAC (SEQ ID NO:126), with complementation to the 3′ end of GG-940-G, GG-940-C, GG-940-Aand GG-940-T and by PCR extension with DreamTaq polymerase (ThermoFisher Scientific) (oligoduplexes I & II FIG. 1). To avoid cleavage ofsome species of the randomized positions, the resulting double-strandedtemplates were each digested with an 8 bp cutting restrictionendonuclease, SdaI, so that overhangs were present at each end; a PstIcompatible overhang and a Taq added single 3′ A overhang. The resultingoverhangs were used to directionally ligate the 4 double-strandedtemplates into pTZ57R/T (Thermo Fisher Scientific) pre-cleaved withPstI. The ligations were Ca²⁺ transformed into DH5a E. coli cells,plasmid DNA was recovered and combined from each of the 4 transformantsderived from GG-940-G, GG-940-C, GG-940-A and GG-940-T to generate therandomized 7 bp NNNNNNN PAM plasmid DNA library.

PAM preference experiments with Sth1 Cas9 protein on the resulting 7 bprandomized PAM plasmid DNA library were carried-out similarly to thatdescribed in Example 3 for the Streptococcus thermophilus CRISPR3 (Sth3)and Streptococcus pyogenes (Spy) Cas9 proteins (against the 5 bprandomized PAM library). Briefly, Sth1 Cas9-crRNA-tracrRNA complexeswere assembled by mixing Cas9 protein with pre-annealed crRNA andtracrRNA duplex (Table 6) at 1:1 molar ratio followed by incubation in acomplex assembly buffer (10 mM Tris-HCl pH 7.5 at 37° C., 100 mM NaCl, 1mM EDTA, 1 mM DTT) at 37° C. for 1 h. Digests were performed using 1 μgof randomized 7 bp PAM library with 50 nM Sth1 crRNA-tracrRNA-Cas9complexes at 37° C. for 60 min., 50 nM Sth1 crRNA-tracrRNA-Cas9complexes at 37° C. for 1 min. and 0.5 nM Sth1 crRNA-tracrRNA-Cas9complexes at 37° C. for 60 min. (FIG. 3). As a positive control, 1 μg ofthe randomized 7 bp PAM library was also digested with Sth3 and SpyCas9-sgRNA complexes (0.5 nM at 37° C. for 60 min.). A 3′ dA was addedto the blunt-ends of the cleaved fragments (FIG. 3). Next, duplexedadapter TK-117/TK-111 with a 3′ dT overhang was ligated to the Aoverhang (FIG. 3). Then, PCR was assembled using primers pUC-dir (SEQ IDNO: 5) and TK-117 (SEQ ID NO: 20) to enrich for PAM sequences thatsupported cleavage (FIG. 3). 40 ng of the resulting PCR product was thenamplified with Phusion® High Fidelity PCR Master Mix (New EnglandBiolabs, M0531L) adding on the sequences necessary for amplicon-specificbarcodes and Illumina sequencing using “tailed” primers through tworounds of PCR each consisting of 10 cycles (FIG. 3). The sequences ofthe barcode specific forward primers used in the primary PCR reactionwere similar to those listed in Table 3 and the reverse primer, JKYS812(CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCGGCGACGTTGGGTC (SEQ ID NO: 32)), waspaired with each of the forward primers. A set of primers,AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACG (Universal Forward, SEQ IDNO: 8) and CAAGCAGAAGACGGCATA (Universal Reverse, SEQ ID NO: 9),universal to all primary PCR reactions were utilized for the secondaryPCR amplification.

TABLE 6RNA molecules used for Sth1 Cas9-crRNA-tracrRNA complex assembly. NameSequence (5′-3′) Origin SEQ ID NO. Sth1 crRNACGCUAAAGAGGAAGAGGACAGUUUUUGUACU Synthetic 33 CUCAAGAUUUA oligonucleotideSth1 tracrRNA GGGUAAAUCUUGCAGAAGCUACAAAGAUAAG In vitro 34GCUUCAUGCCGAAAUCAACACCCUGUCAUUU transcription UAUGGCAGGGUGUUUUCG

The resulting PCR amplifications were prepared and IIlumina deepsequenced as described in Example 1 and PAM preference analysis wascarried-out as described in Example 3 for the Sth3 and Spy (50 nM and100 nM-60 minute digests) examining the percent nucleotide compositionof the PAM sequences with fold enrichment relative to the uncut library.As shown in FIG. 14 and FIG. 15, the PAM preferences for the positivecontrols, Sth3 and Spy Cas9 proteins, are nearly identical regardless ofthe length of the randomized PAM plasmid DNA library used; either 5 bpor 7 bp. The PAM preferences observed for the Sth1 Cas9 protein areshown in FIG. 16 and match those previously reported (NNAGAAW). Just asobserved for Sth3 and Spy Cas9 proteins, the PAM specificity of Sth1 ismore relaxed at higher concentrations of guide RNA-Cas9 complex. This ismost evident at position 5 where an off-preference for a C nucleotide isless prevalent at the lower 0.5 nM complex concentration.

Since the canonical PAM sequence preferences for Spy, Sth1 and Sth3 Cas9proteins may be recapitulated with our assay regardless of the type ofguide RNA used (either crRNA-tracrRNA or sgRNA) or length of therandomized PAM sequence, suggests that the in vitro PAM library assaydescribed herein or derivations of it may be used to directlyinterrogate PAM specificity from any Cas9 assuming the guide RNAsequences, either crRNA-tracrRNA or sgRNA, may be successfully deduced.Additionally, our assay grants precise control over the amount of Cas9protein used in the in vitro digestion assays described herein allowinga detailed examination of Cas9 PAM specificity as a function ofCas9-guide RNA complex concentration as evident by the apparentbroadening in PAM specificity as Cas9-guide RNA complex concentrationwas increased.

Example 5 Identification of Brevibacillus laterosporus crRNA, tracrRNAand Cas9 Endonuclease

To empirically examine the PAM preferences for a Cas9 protein whose PAMwas undefined, a cas9 gene from an uncharacterized Type II CRISPR-Cassystem was identified by searching internal Pioneer-DuPont databasesconsisting of microbial genomes with the amino acid sequence of S.thermophilus CRISPR3 (Sth3) Cas9 (SEQ ID NO: 35). Amino acid alignmentof Sth3 revealed 12.9% identity and 24.4% similarity at the proteinlevel with a protein derived from a single long open-reading-frame of3279 nucleotides (SEQ ID NO: 36) from the Brevibacillus laterosporusbacterial strain SSP360D4. Translation of the open-reading-frame encodesa protein of 1092 amino acids (not including the stop codon). Based onPFAM database searches the protein contained HNH endonuclease andCRISPR-associated domains all hallmarks of a Cas9 protein. The cas9 geneof SSP360D4 was also located upstream of a CRISPR array comprised of 7repeat-spacer units (FIG. 17A). The repeat and spacer length (36 and 30bp, accordingly) is similar to other Type II CRISPR-Cas systems.However, 5 of 8 repeats contain 1 or 2 bp mutations (FIG. 17B).Sequences of Repeat1 (SEQ ID NO: 37), Repeat4 (SEQ ID NO: 40) andRepeat5 (SEQ ID NO: 41) are conserved; therefore this sequence wasselected as a template for designing single guide RNAs (sgRNAs). Aregion upstream of the cas9 gene is partially complementary(anti-repeat) to the 5′-terminus of the repeat suggesting a putativetracrRNA (FIG. 17A). The possible transcriptional directions of theputative tracrRNA were considered by examining the secondary structuresand possible termination signals present in a RNA version of the senseand anti-sense genomic DNA sequences surrounding the anti-repeat.However, the transcriptional direction of the tracrRNA and CRISPR regioncould not be reliably determined bioinformatically, so a methoddescribed in Example 7 was designed to empirically determine theappropriate directions of transcription. Other genes typically found ina Type II CRISPR-Cas locus were either truncated, as was the case forcas1, or missing (FIG. 17A).

Example 6 Protein Expression and Purification of Brevibacilluslaterosporus Cas9 Protein

To examine the PAM specificity and guide RNA of Brevibacilluslaterosporus (Blat) Cas9 protein with the in vitro cleavage assaysdescribed in Examples 7& 8, Blat Cas9 protein was E. coli expressed andpurified. Briefly, a DNA fragment encoding the Brevibacilluslaterosporus Cas9 protein was PCR amplified directly from thePioneer-DuPont strain, SSP360D4, using Blat-Cas9-dir(TACCATGGCATACACAATGGGAATAGATG (SEQ ID NO: 45) and Blat-Cas9-rev(TTCTCGAGACGACTAGTTGATTTAATCGAATTGAC (SEQ ID NO: 46) primer pair andcloned into a pBAD24-CHis expression vector pre-cleaved over NcoI andXhoI sites. To establish optimal expression conditions three differentE. coli strains, BL21 (DE3), DH10B and Rosetta (DE3), were analyzed.Highest expression yield of soluble Blat Cas9 protein was obtained inthe BL21 (DE3) strain.

For purification, Blat Cas9 protein was expressed in E. coli BL21 (DE3)strain grown in LB broth supplemented with ampicillin (100 mg/ml). Cellswere grown at 37° C. to an OD 600 of 0.5 at which time the growthtemperature was decreased to 16° C. and expression induced with 0.2%(w/v) arabinose for 20 h. Cells were pelleted and resuspended in loadingbuffer (20 mM KH₂PO₄ pH7.0, 0.5 M NaCl, 10 mM imidazole, 5% glycerol)and disrupted by sonication. Cell debris was removed by centrifugation.The supernatant was loaded onto the Ni²⁺-charged 5 ml HiTrap chelatingHP column (GE Healthcare) and eluted with a linear gradient ofincreasing imidazole concentration. The fractions containing Cas9 werepooled and subsequently loaded onto heparin column for elution using alinear gradient of increasing NaCl concentration (from 0.5 to 1 M NaCl).The fractions containing Cas9 were pooled and dialyzed against 10mMBis-Tris-HCl pH 7.0, 300 mM KCl, 1 mM EDTA, 1 mM DTT, 50% (v/v)glycerol and stored at −20° C.

Example 7 Determination of Guide RNAs for the Cas9 of Brevibacilluslaterosporus

To determine a guide RNA for the Cas9 protein identified in theBrevibacillus laterosporus (Blat) Type II CRISPR-Cas system, we designedtwo single guide RNA (sgRNA) variants to account for both possibleexpression scenarios of the tracrRNA and CRISPR array (FIG. 18 & Table7) and used them to probe which expression scenario supported cleavageactivity of Blat Cas9 in the 7 bp randomized PAM plasmid DNA libraryfrom Example 4.

sgRNAs were designed by first identifying the boundaries of the putativetracrRNA molecules by analyzing regions which were partiallycomplementary to the 22 nt 5′ terminus of the repeat (anti-repeat).Next, to determine the 3′ end of the tracrRNA, possible secondarystructures and terminators were used to predict the region oftermination in the downstream fragment (FIGS. 19 and 20) using Mini-fold(Markham et al. (2008) Methods in Molecular Biology 453: 3-31). ThesgRNAs contained a T7 polymerase transcription initiation recognitionsignal at the 5′ end followed by a 20 nt target recognition sequence, 16nt of crRNA repeat, 4 nt self-folding hairpin loop and anti-repeatsequence complementary to the repeat region of the crRNA followed by theremaining 3′ part of the putative tracrRNA (Table 7). The sgRNA variantwhich contains a putative tracrRNA transcribed in the same direction asthe cas9 gene is termed “direct” sgRNA, while the sgRNA containing thetracrRNA transcribed in the opposite direction a “reverse” sgRNA (FIG.18).

The “direct” sgRNA encoding gene was obtained in two PCR steps. Firsttwo fragments were generated by PCR using GG-969/GG-839 andTK-149/TK-150 oligonucleotide primer pairs (Table 8). The fragments werepurified with the GeneJET PCR Purification Kit (Thermo FisherScientific) and the full length sgRNA gene was assembled from thesefragments by overlapping PCR using GG-969/TK-150 primer pairs. The“reverse” sgRNA encoding gene was amplified by PCR using GG-840/GG-841oligonucleotide primer pairs (Table 8). To generate the sgRNA encodingplasmids pUC-Blat-dir-sgRNA and pUC-Blat-rev-sgRNA, the PCR fragmentswere cloned into pUC18 vector digested with SacI.

TABLE 7 “Direct” and “reverse” Blat sgRNAs used to deducetranscriptional direction of crRNA and tracrRNA loci. Variable RemainingT7 Targeting 16 nt Putative 3′ Blat Transcription domain of the Anti-tracrRNA SEQ sgRNA Initiation (SEQ ID NO:) repeat Loop Repeat SequenceID NO: Direct GGG 193 195 GAAA 197 199 47 Reverse GGG 194 196 GAAA 198200 48

TABLE 8 Oligonucleotides used for Blat sgRNA geneconstruction and sgRNA production. Name Sequence (5′-3′) SEQ ID NO.GG-969 GGGCGCTAAAGAGGAAGAGGACAGCTATAGTTCCTTACTGAAAGGTAA 49GTTGCTATAGTAAGGGCAAC GG-839CTAAAAACGGGCTAGGCGATCCCCAACGCCTCGGGTCTGTTGCCCTTA 50 CTATAGCAACTTAC

149 GATCGCCTAGCCCGTTTTTACGGGCTCTCCCCATATTCAAAATAATGA CAGACGA TK-150AAAAAAAAGCACCTCGGAAATAAATGCTCCAAGGTGCTCGTCTGTCAT TATTTTGAATATGG GG-840GGGCGCTAAAGAGGAAGAGGACAATCATATCATATCGAGGAAACTTGA 53TATGATATGATACTTTCATTTTA GG-841CATAAAATAGACAGATAAATGAGATTGACTTCGATGATATATGGATAT 54AAAATGAAAGTATCATATCATATCAAG TK-124 TAATACGACTCACTATAGGGCGCTAAAGAGGAAGAGG55 TK-151 AAAAAAAAGCACCTCGGAAATAAATG 56 TK-126ATAAAATAGACAGATAAATGAGATTGACTTCG 57

indicates data missing or illegible when filed“Direct” and “reverse” Blat sgRNAs were obtained by in vitrotranscription using TranscriptAid T7 High Yield Transcription Kit(Thermo Fisher Scientific) from the PCR fragments containing a T7promoter at the proximal end of the RNA coding sequence. The “direct”sgRNA encoding fragment (177 nt) was generated using theTK-124/TK-151primer pair (Table 8) with pUC-Blat-dir-sgRNA plasmid DNAas template, whereas the “reverse” sgRNA encoding fragment (118 nt) wasgenerated using the TK-124/TK-126 primer pair with pUC-Blat-rev-sgRNAplasmid as template (Table 7). The resulting sgRNAs were purified usingGeneJET RNA Cleanup and Concentration Micro Kit (Thermo FisherScientific) and used for complex assembly. Blat Cas9-sgRNA complexeswere assembled by mixing Cas9 protein with sgRNA at 1:1 molar ratiofollowed by incubation in a complex assembly buffer (10 mM Tris-HCl pH7.5, 100 mM NaCl, 1 mM EDTA, 1 mM DTT) at 37° C. for 1 h. Blat Cas9cleavage of the 7 bp randomized PAM plasmid DNA library was performedsimilarly as described above for Spy and Sth3 Cas9 proteins (Example 3).Briefly, 50 nM of Blat Cas9 complexes, assembled using “direct” or“reverse” sgRNAs, respectively, were incubated with 1 μg plasmid DNA for60 min at 37° C. After library digestion and addition of 3′ dAoverhangs, adapters were ligated and cleavage products were PCRamplified (FIG. 3). Analysis of reaction products by agarose gelelectrophoresis revealed that the “direct” sgRNA, but not the “reverse”sgRNA supported plasmid library cleavage (FIG. 21). Single guide RNAstargeting a target site in the genome of an organism can be designed bychanging the targeting sequence of SEQ ID NO: 47 with any randomnucleotide that can hybridize to any desired target sequence (guide RNAas shown in SEQ ID NO: 127).

Example 8 Identification of PAM Preferences for Brevibacilluslaterosporus Cas9 Protein

After determining a guide RNA for Brevibacillus laterosporus (Blat)Cas9, PAM identification was performed similarly to that described inExample 3 for the Spy and Sth3 Cas9 proteins. Briefly, 1 μg of 7 bprandomized PAM plasmid library was digested with various concentrationsof Blat Cas9-“direct” sgRNA complex, ranging between 0.5-50 nM, and atvarious reaction times, ranging from 1 to 60 minutes. Next, 3′ dAoverhangs were added to the cleavage products, adapters ligated andadapter-ligated cleavage products were PCR amplified. PCR reactions werethen electrophoresed on a 1% agarose gel and visualized. As shown inFIG. 22 and similarly to that described for Sth3 and Spy Cas9 proteins,the minimal concentration and cleavage time needed to supportvisualization after PCR amplification were 0.5 nM (at a 60 min.incubation time) or 1 min. (at a 50 nM concentration of Cas9 complex).Next, the amplifications for the 50 nM-60 min., 50 nM-1 min. and 0.5nM-60 min. digests were purified with the GeneJET PCR Purification Kit(Thermo Fisher Scientific) and Illumina sequencing anchors added by tworounds of PCR as described in Example 3 for the Sth3 and Spy Cas9proteins when examining their PAM preferences with the 7 bp randomizedPAM library. The resulting Illumina compatible libraries were thensequenced as described in Example 1 and PAM preference analysis wascarried-out as described in Example 3 for the Sth3 and Spy (50 nM and100 nM-60 minute digests) examining the percent nucleotide compositionof the PAM sequences with fold enrichment relative to the uncut library.When the composition of the PAM sequences with ≥2 fold enrichment forthe 50 nM-60 minute, 50 nM-1 minute and 0.5 nM-60 minute digests wereanalyzed, the consensus PAM sequence for the Blat Cas9 protein wasNNNNCND (N=G, C, A or T; D=A, G or T) with a strong preference for a Cat position 5 of the PAM sequence (FIG. 23). A moderate preference foran A was observed at position 7 and slight preferences for a C or T atposition 4 and G, C or A over T at position 6 was also noted. Similarlyto Sth1, Sth3 and Spy Cas9 proteins, the PAM specificity broadens as theCas9-sgRNA complex concentration increases. This is most evident atposition 5 where a larger proportion of PAM sequences containing an Aresidue support cleavage at 50 nM compared with 0.5 nM Cas9-sgRNAcomplexes.

To confirm the cleavage positions for the Blat Cas9 protein, weengineered the pUC18-T1-GTCCCGT-PAM plasmid containing a 20 base pairregion matching the spacer T1 (SEQ ID NO: 1) followed by a PAM sequence,GTCCCGT, falling within the PAM consensus for Blat. To generate theplasmid, first the synthetic oligoduplex containing T1 and GTCCCGT PAMsequences was assembled by annealing complementary oligonucleotidesGG-935 (CAAATTCTAAACGCTAAAGAGGAAGAGGACAGTCCCG (SEQ ID NO: 58) and GG-936(AATTCGGGACTGTCCTCTTCCTCTTTAGCGTTTAGAATTTGAGCT (SEQ ID NO: 59) andligated into pUC18 vector pre-cleaved with ScaI and EcoRI. 2.5 μg of theresulting plasmid was then digested with 100 nM of the Blat Cas9-sgRNAcomplex in the 500 μl of reaction buffer at 37° C. for 60 min., purifiedusing GeneJET PCR Purification Kit (Thermo Fisher Scientific) andelectrophoresed on an agarose gel. Linear digestion products were thenpurified from the agarose gel using the GeneJET Gel Extraction Kit(Thermo Fisher Scientific). The cleaved region in Blat Cas9 linearizedpUC18-T1-GTCCCGT-PAM plasmid was then directly sequenced with thepUC-EheD (CCGCATCAGGCGCCATTCGCC (SEQ ID NO: 60) and pUC-LguR(GCGAGGAAGCGGAAGAGCGCCC (SEQ ID NO: 61) primers. The sequence resultsconfirmed that plasmid DNA cleavage occurred in the protospacer 3 bpaway from the PAM sequence (FIG. 24) similar to that observed for Sth3and Spy Cas9 proteins.

The NNNNCND PAM sequence identified herein, can be introduced adjacentto any polynucleotide of interest, thereby creating a target site thatcan be recognized by a guide RNA/Cas9 endonuclease complex describedherein, wherein the guide RNA/Cas9 endonuclease system is capable ofrecognizing, binding to, and optionally nicking or cleaving all or partof the target sequence adjacent to the NNNNCND PAM sequence.

Example 9 Characterization of Cas9 Endonucleases and their PAMPreferences, and Cognate Guide RNAs from Diverse Organisms

The rapid in vitro methods described herein (Examples 1-8) can be usedto identify and characterize Cas endonucleases from any organism andtheir related PAM preferences and guide RNAs elements.

Cas9 proteins of Type II-A, II-B and II-C subtypes were identified fromthe NCBI NR database using the PSI-BLAST program (Altschul S F, et al.(1997) Nucleic Acids Res. 25:3389-3402). A phylogenetic relationship ofeach Cas9 protein was visualized with CLANs software (Frickey T, LupasA. (2004) Bioinformatics 20:3702-3704) and putative Cas9 endonucleasesfrom different groupings were selected. Genomic DNA regions derived fromnon-pathogenic sources and those containing aclustered-regularly-interspace-short-palindromic repeat (CRISPR) arrayand a putative trans-activating CRISPR RNA (tracrRNA) coding region(defined by homology to the CRISPR repeat and termed the anti-repeat) inthe vicinity of the Cas9 were chosen. In total, 11 diverse genomic DNAregions were selected for further analysis (Table 9)

A schematic of the genomic locus for each system is depicted in FIGS.25-35. The cas9 gene open-reading-frame (ORF), CRISPR array, anti-repeat(the genomic DNA region demonstrating partial homology to the repeatconsensus that indicates the location of the encoded tracrRNA) and otherCRISPR-Cas genes are indicated for each system. The genomic DNA sequenceand length of each cas9 gene ORF and cas9 gene translation (notincluding the stop codon) are referenced in Table 10 for each system.Table 10 lists the consensus sequence of the CRISPR array repeats fromthe genomic DNA locus of each system and the sequences of theanti-repeat for each system (as genomic DNA sequence on the same strandas the cas9 gene ORF).

As was done for the Brevibacillus laterosporus (BLAT) Type II CRISPR/Cas system (described in Example 6), the possible transcriptionaldirections of the putative tracrRNAs for each new system were consideredby examining the secondary structures and possible termination signalspresent in a RNA version of the sense and anti-sense genomic DNAsequences surrounding the anti-repeat. Based on the hairpin-likesecondary structures present for each system, the transcriptionaldirection of the tracrRNA was deduced for 10 of the 11 diverse Type IICRISPR-Cas systems. Because the anti-repeat in the tracrRNA canhybridize to the crRNA derived from the CRISPR array to form a duplexedRNA capable of guiding the Cas9 endonuclease to cleave invading DNA thetranscriptional direction of the CRISPR array may also be determinedbased of the direction of tracrRNA transcription (since double-strandedRNA hybridizes with 5′ to 3′ directionality). The deducedtranscriptional directions of both the tracrRNA and CRISPR array foreach system are listed in Table 10 and are depicted in FIGS. 25-35.Based on the likely transcriptional direction of the tracrRNA and CRISPRarray, single guide RNAs (sgRNAs) were also designed and are shown inTable 12. For the system, Sulfurospirillum sp. SCADC, where thetranscriptional direction of the tracrRNA and CRISPR array could not bededuced two sgRNAs were designed (as described in Example 7 for the BlatType II CRISPR-Cas system); one for each possible direction of tracrRNAtranscription (Table 12).

Next the sgRNAs, will be complexed with the respective purified Cas9protein and assayed for their ability to support cleavage of the 7 bprandomized PAM plasmid DNA library (as described in Example 7 for theBlat Type II CRISPR-Cas system). If the sgRNA does not support cleavageactivity, new guide RNA designs (either sgRNA or duplexed crRNA andtracrRNA; in both possible transcriptional directions of the CRISPRarray and anti-repeat region) will be tested for their ability tosupport cleavage.

Once a guide RNA that supports Cas9 cleavage has been established, thePAM specificity of each Cas9 endonuclease can be assayed (as describedin Example 7 for the Blat Type II CRISPR-Cas system). After PAMpreferences have been determined, the sgRNAs may be further refined formaximal activity or cellular transcription by either increasing ordecreasing the tracrRNA 3′ end tail length, increasing or decreasingcrRNA repeat and tracrRNA anti-repeat length, modifying the 4 ntself-folding loop or altering the sequence composition.

TABLE 9 List of 11 organisms selected for the identification of diverseType II CRISPR-Cas systems described herein. CRISPR-Cas Bacterial OriginAbbreviation System Subtype Isolated from Lactobacillus reuteri Mlc3Lreu II-A Sourdough Lactobacillus rossiae DSM 15814 Lros II-A SourdoughPediococcus pentosaceus SL4 Ppen II-A Meat Lactobacillus nodensis JCM14932 Lnod II-A Dairy Sulfurospirillum sp. SCADC Sspe II-B Oil sandstailings pond Bifidobacterium thermophilum DSM 20210 Bthe II-C DairyLoktanella vestfoldensis Lves II-C Lakes Ace and Pendant, VestfoldHills, Antarctica Sphingomonas sanxanigenens NX02 Ssan II-C Isolatedfrom soil Epilithonimonas tenax DSM 16811 Eten II-C River epilthonSporocytophaga myxococcoides Smyx II-C From soil, cellulose decomposingorganism Psychroflexus torquis ATCC 700755 Ptor II-C Prydz Bay,Antarctica

TABLE 10 Sequence and length of the cas9 gene ORF and cas9 genetranslation from each Type II CRISPR-Cas system identified by themethods described herein. Translation of cas9 Gene Length of ORF (notcas9 Gene cas9 Gene Length of including the Translation Bacterial ORF(SEQ cas9 Gene stop codon) (No. of Amino Origin ID NO) ORF (bp) (SEQ IDNO) Acids) Lreu 70 4107 81 1368 Lros 71 4110 82 1369 Ppen 72 4041 831346 Lnod 73 3393 84 1130 Sspe 74 4086 85 1361 Bthe 75 3444 86 1147 Lves76 3216 87 1071 Ssan 77 3318 88 1105 Eten 78 4200 89 1399 Smyx 79 436290 1453 Ptor 80 4530 91 1509

TABLE 11 CRISPR repeat consensus, anti-repeat (putative tracrRNA codingregion) and deduced transcriptional directions of tracrRNA and CRISPRarray relative to the cas9 gene ORF for 11 diverse Type II CRISPR-Cassystems. tracrRNA CRISPR Array CRISPR Transcriptional TranscriptionalRepeat Anti- Direction Direction Consensus Repeat (Relative to (Relativeto Bacterial (SEQ ID (SEQ ID the cas9 Gene the cas9 Gene Origin NO) NO)ORF) ORF) Lreu 92 103 Antisense Sense Lros 93 104 Antisense Sense Ppen94 105 Antisense Sense Lnod 95 106 Sense Sense Sspe 96 107 Sense/ Sense/Antisense Antisense Bthe 97 108 Sense Antisense Lves 98 109 AntisenseAntisense Ssan 99 110 Antisense Antisense Eten 100 111 AntisenseAntisense Smyx 101 112 Antisense Sense Ptor 102 113 Antisense Antisense

TABLE 12 Examples of sgRNAs components for each new diverse Type IICRISPR-Cas system described herein. Variable Remaining T7 TargetingPutative 3′ SEQ Bacterial Transcription domain crRNA tracrRNA tracrRNAID Origin Initiation (VT) Repeat Loop Anti-Repeat Sequence NO: Lreu GGGN₂₀ ₍*₎ 149 N₄ ₍**₎ 161 173 128 Lros GGG N₂₀ ₍*₎ 150 N₄ ₍**₎ 162 174 129Ppen GGG N₂₀ ₍*₎ 151 N₄ ₍**₎ 163 175 130 Lnod GGG N₂₀ ₍*₎ 152 N₄ ₍**₎164 176 131 Sspe GGG N₂₀ ₍*₎ 153 N₄ ₍**₎ 165 177 132 (tracrRNA Sense-crRNA Sense) Sspe GGG N₂₀ ₍*₎ 154 N₄ ₍**₎ 166 178 133 (tracrRNAAntisense- crRNA Antisense) Bthe GGG N₂₀ ₍*₎ 155 N₄ ₍**₎ 167 179 134Lves GGG N₂₀ ₍*₎ 156 N₄ ₍**₎ 168 180 135 Ssan GGG N₂₀ ₍*₎ 157 N₄ ₍**₎169 181 136 Eten GGG N₂₀ ₍*₎ 158 N₄ ₍**₎ 170 182 137 Smyx GGG N₂₀ ₍*₎159 N₄ ₍**₎ 171 183 138 Ptor GGG N₂₀ ₍*₎ 160 N₄ ₍**₎ 172 184 139

N_(20(*)) indicates a series of 20 nucleotides as one example of a sgRNAvariable targeting domain. As described herein, the variable targetingdomain of a sgRNA can vary for example, but not limiting from at least12 to 30 nucleotides. N_(4(**)) indicates a loop of 4 nucleotides suchas but not limiting to GAAA. As described herein, the length of the loopcan vary from at least 3 nucleotides to 100 nucleotides.

Single guide RNAs targeting a target site in the genome of an organismcan be designed by changing the targeting sequence of any one of SEQ IDNOs: 114-125 with any random nucleotide that can hybridize to anydesired target sequence (such as, but not limiting to, guide RNAs asshown in SEQ ID NO: 128-139).

Example 10 PAM Specificity is not Greatly Influenced by the Type orComposition of the Guide RNA

As described in Example 3 and 4, to empirically examine the PAMpreferences for Streptococcus pyogenes (Spy), Streptococcus thermophilusCRISPR3 (Sth3) and Streptococcus thermophilus CRISPR1 Cas9 proteins, tworandomized PAM libraries (described in Example 1 and 4) were generated.The two libraries increased in size and complexity from 5 randomizedbase pairs (1,024 potential PAM combinations) to 7 randomized base pairs(16,384 potential PAM combinations). These randomized libraries weresubject to digestion with purified Sth3 and Spy Cas9 proteins (5Nlibrary, Example 3) and Sth1 (Example 4, 7N library) and guide RNAcontaining a variable targeting domain T1 that hybridizes with, i.e., iscomplementary to, a sequence in the target DNA molecule (referred hereinas target sequence), T1 (SEQ ID NO: 1).

To confirm that PAM specificity is independent of the type of guide RNA,duplexed crRNA: tracrRNA or single guide RNA (sgRNA), Spy, Sth3 and Sth1Cas9 PAM preferences were examined using Cas9 sgRNA RNP complexesinstead of Cas9 and crRNA:tracrRNA RNP complexes. Digestion wascarried-out at a single RNP complex concentration of 0.5 nM and PAMpreference analysis was performed as described herein. PAM preferenceswere nearly identical regardless of the type of guide RNA used; either acrRNA:tracrRNA duplex or sgRNA (U.S. patent application 62/196,535,filed Jul. 24, 2015, which is incorporated herein in its entirety byreference).

To confirm that PAM specificity is not greatly influenced by thecomposition of the target DNA or spacer sequence, the sequence on theopposite side of the 5 or 7 bp randomized library was targeted forcleavage with a different variable targeting domain, T2-5 for the 5 bplibrary or T2-7 for the 7 bp library. Spy and Sth3 Cas9 proteinspreloaded with sgRNAs targeting the T2 sequence were used to interrogatethe 5 bp randomized PAM library while the Sth1 Cas9-T2 sgRNA complexeswere used to digest the 7 bp randomized PAM library. The library wasdigested with Spy, Sth3 and Sth1 Cas9 proteins preloaded with sgRNAstargeting the T2 sequence and PAM preferences were assayed as describedabove. The PAM preferences for all 3 Cas9 proteins were nearly identicalregardless of spacer and target DNA sequence (U.S. patent application62/196,535, filed Jul. 24, 2015).

Example 11 Identification of Extended PAM Sequences

As shown in FIG. 23 (Example 8), the PAM consensus for the Blat Cas9protein under the 0.5 nM digest conditions was NNNNCND (N=G,C, A or T;D=A, G or T) with a strong preference for a C at position 5 of the PAMsequence. A moderate preference for an A was observed at position 7 andslight preferences for a C or T at position 4 and G, C or A over T atposition 6 were also noted when closely examining FIG. 23. Similarly toSpy, Sth3 and Sth1 Cas9 proteins, the PAM specificity broadens as theCas9-sgRNA complex concentration increases. This was most evident atposition 5 where a larger proportion of PAM sequences containing an Aresidue support cleavage at 50 nM compared with 0.5 nM digest conditions(FIG. 23).

Since Blat Cas9 may accept any base in the first 3 positions of its PAMsequence (FIG. 23), the spacer domain T1 (and corresponding variabletargeting domain in the guide RNA) was shifted by 3 nucleotides to allowPAM identification to be extended from 7 to 10 bp. The shifted T1variable targeting domain, T1-3, was incorporated into the Blat “direct”sgRNA resulting in a sgRNA referred to as Blat sgRNA (T1-3) and PAMidentification was performed as described previously for Spy, Sth3, Sth1and Blat Cas9 proteins. PAM preference analysis revealed the PAMspecificity for Blat Cas9 can be extended out to position 8 where thereis a moderate preference for an additional A (U.S. patent application62/196,535, filed Jul. 24, 2015).

To validate the PAM specificity for Blat Cas9, plasmids were engineeredto contain mutations (GTCCCGAA (reference), GTCACGAA, GTCCTGAA,GTCCCGCA, GTCCCGAC, GTCCCGCC with mutations shown in bold andunderlined, U.S. patent application 62/196,535, filed Jul. 24, 2015) inthe most conserved residues of the PAM immediately downstream of a 20base pair region matching the variable targeting domain T1. In vitrocleavage reactions with the various PAM sequences were initiated bymixing supercoiled plasmid DNA with pre-assembled Blat Cas9-sgRNAcomplex (1:1 v/v ratio) at 15° C. The final reaction mixture contained 3nM plasmid, 50 nM Cas9, 10 mMTris-HCl (pH 7.5 at 37° C.), 100 mM NaCl, 1mM DTT and 10 mM MgCl2 in a 100 μl reaction volume. Aliquots wereremoved at timed intervals and quenched with phenol/chloroform. Theaqueous phase was mixed with 3× loading dye solution (0.01% (w/v)bromophenol blue and 75 mM EDTA in 50% (v/v) glycerol) and reactionproducts analyzed by agarose gel electrophoresis. The amount ofsupercoiled (SC) form was evaluated by densitometric analysis ofethidium bromide stained gels using the software ImageJ. Values ofreaction rate constants were obtained as described by Szczelkun et al,2014, Proc. Natl. Acad. Sci. U.S.A 111: 9798-803). Replacement of the Cnucleotide at the 5th position abolished plasmid DNA cleavage confirmingits key role in Blat Cas9 PAM recognition. Replacement of A nucleotidesat the 7th and 8th positions significantly reduced (43× and 12×,respectively) the cleavage rate of supercoiled plasmid also indicatingthe importance of these nucleotides in Blat Cas9 PAM recognition.

To confirm the cleavage positions for the Blat Cas9 protein with anoptimal PAM sequence, a plasmid was engineered that contained a 20 basepair region matching the variable targeting domain T1 followed by a PAMsequence, GTCCCGAA, falling within the PAM consensus for Blat Cas9,NNNNCNDD. We used direct sequencing to determine the ends of the linearDNA molecule generated by the Blat Cas9 RNP complex. The sequenceresults confirmed that plasmid DNA cleavage occurred in the protospacer3 nucleotides away from the PAM sequence (similar to that observed forSpy, Sth3 and Sth1 Cas9 proteins (Garneau et al, 2010, Nature 468:67-71; Gasiunas et al, 2012, Proc. Natl. Acad. Sci. U.S.A 109:E2579-2586; Jinek et al, 2012, Science 337: 816-21).

Example 12 In Planta Genome Editing Using Blat Cas9 and sgRNA

Following elucidation of the sgRNA and PAM preferences for Blat Cas9,maize optimized Cas9 and sgRNA expression cassettes were generated forin planta testing. The Blat cas9 gene was maize codon optimized andintron 2 of the potato ST-LSI gene was inserted to disrupt expression inE. coli and facilitate optimal splicing (Libiakova et al, 2001. PlantCell Rep. 20: 610-615). To facilitate nuclear localization of the BlatCas9 protein in maize cells, Simian virus 40 (SV40) monopartite andAgrobacterium tumefaciens bipartite VirD2 T-DNA border endonucleasenuclear localization signals were incorporated at the amino andcarboxyl-termini of the Cas9 open reading frame, respectively (U.S.patent application 62/196,535, filed Jul. 24, 2015). To express theresulting maize optimized Blat cas9 gene in a robust constitutivemanner, it was operably linked to a maize Ubiquitin promoter, 5′ UTR andintron (Christensen et al, 1992, Plant Mol. Biol. 18: 675-689) and pinIIterminator (An et al, 1989, Plant Cell 1: 115-122) in a plasmid DNAvector. To confer efficient sgRNA expression in maize cells, a maize U6polymerase III promoter region isolated from Zea mays cultivar B73residing on chromosome 8 at position 165,535,024-165,536,023 (B73RefGen_v3) and terminator (TTTTTTTT) were isolated and operably fused tothe 5′ and 3′ ends of a modified Blat sgRNA encoding DNA sequence. Themodified Blat sgRNA contained two modifications from the sgRNA that wasused in the in vitro studies (see Blat sgRNA (T1) direct; SEQ ID NO:151), a T to G alteration at position 101 and a T to C modification at159. The changes were introduced to remove potential premature U6polymerase III signals in the Blat sgRNA. Alterations were introduced tohave minimal impact on the secondary structure of the sgRNA compared tothe version used in the in vitro studies. For a direct comparison withthe Blat Cas9 sgRNA system, equivalent Cas9 and sgRNA DNA expressionvectors were also prepared for the Spy Cas9 sgRNA system.

To carefully compare the mutational efficiency resulting from theimperfect non-homologous end-joing (NHEJ) repair of DNA double-strandbreaks (DSBs) resulting from Spy and Blat Cas9 cleavage, protospaceridentical genomic target sites were selected by identifying targets withSpy and Blat Cas9 compatible PAMs, NGGYCVAA. Since Blat and Spy Cas9both cleave between the 3 and 4 bp upstream of their respective PAM,genomic targets will be cleaved at the exact same position allowing atighter correlation between NHEJ mutation frequency and cleavageactivity. Identical variable targeting domain sequences were selectedfor Blat and Spy Cas9 by capturing the 18 to 21 nt sequence immediatelyupstream of the PAM. To ensure optimal U6 polymerase III expression andnot introduce a mismatch within the sgRNA variable targeting domain(spacer), all target sequences were selected to naturally terminate in aG at their 5′ end. Targets were selected in exon 1 and 4 of the maizefertility gene Ms45 (referred to as MS45 Exon1 and MS45 Exon 4; see alsoU.S. Pat. No. 5,478,369 incorporated herein by reference) and within thepromoter region of the maize liguleless-1 gene (refered to as LIG34Promoter target herein; Moreno et al. 1997. Genes and Development11:616-628).

To rapidly examine the mutational activity of Blat Cas9 with the PAM andsgRNA identified herein, Blat and the equivalent Spy Cas9 and sgRNA DNAexpression vectors were independently introduced into maize Hi-II(Armstrong & Green, 1985, Planta 164: 207-214) immature embryos (IEs) byparticle gun transformation similar to that described in (Ananiev et al,2009, Chromosoma 118: 157-177). Since particle gun transformation can behighly variable, a visual marker DNA expression cassette, Ds-Red, wasalso co-delivered with the Cas9 and sgRNA expression vectors to aid inthe selection of evenly transformed IEs. In total, 3 transformationreplicates were performed on 60-90 IEs and 20-30 of the most evenlytransformed IEs from each replicate were harvested 3 days aftertransformation. Total genomic DNA was extracted and the regionsurrounding the target site was PCR amplified and deep sequenced to aread depth in excess of 300,000. The resulting reads were examined forthe presence of mutations at the expected site of cleavage by comparisonto control experiments where only the Cas9 DNA expression cassette wastransformed. Mutations arising at the expected site of cleavage for BlatCas9 were detected with the most prevalent types of mutations beingsingle base pair insertions or deletions. This pattern of imprecisemutagenic repair of the double-stranded DNA cut introduced by Blat Cas9was also observed for the Spy Cas9 (U.S. patent application 62/196,535,filed Jul. 24, 2015) and at other Cas9 sites in maize (data not shown).The mutational activity for Blat Cas9 was robust at 2 of the 3 sitestested and exceeded that of the Spy Cas9 at the Ms45 Exon 4 target siteby ˜30%.

In Planta Mutation Detection

The DNA region surrounding the expected site of cleavage for eachCas9-guide RNA was amplified by PCR using Phusion® High Fidelity PCRMaster Mix (NEB,USA) “tailing” on the sequences necessary foramplicon-specific barcodes and IIlumina sequences through two rounds ofPCR each consisting of 20 cycles. The primer pairs used in the primaryPCR were primer pairs corresponding to the Ms45 exon 1, Ms45 exon 4 andLig34 promoter regions, respectively. A set of primers universal to theproducts from the primary reactions, were used in the secondary PCRreaction (U.S. patent application 62/196,535, filed Jul. 24, 2015). Theresulting PCR amplifications were purified with a Qiagen PCRpurification spin column (Qiagen, Germany), concentration measured witha Hoechst dye-based fluorometric assay, combined in an equimolar ratio,and single read 100 nucleotide-length amplicon sequencing was performedon Illumina's MiSeq Personal Sequencer with a 5-10% (v/v) spike of PhiXcontrol v3 (Illumina, FC-110-3001) to off-set sequence bias. Only thosereads with a ≥1 nucleotide INDEL arising within the 10 nt windowcentered over the expected site of cleavage and not found in thenegative controls were classified as mutations. Mutant reads with anidentical mutation were counted and collapsed into a single read and thetop 10 most prevalent mutations were visually confirmed as arisingwithin the expected site of cleavage. The total numbers of visuallyconfirmed mutations were then used to calculate the percentage of mutantreads based on the total number of reads of an appropriate lengthcontaining a perfect match to the barcode and forward primer.

Example 13 Simplified Construction of RandomizedProtospacer-Adjacent-Motif (PAM) Libraries for Assaying Cas EndonucleasePAM Preferences

To simplify construction for randomized PAM libraries, a fullydouble-stranded DNA oligoduplex as described in Example 1 (oligoduplexII) containing a region of randomization immediately adjacent to a DNAtarget sequence may be used directly as template for Cas endonucleasedigestion. This would eliminate the cloning of the oligoduplex IIfragment into a plasmid DNA vector allowing randomized PAM libraries tobe constructed without the downstream E. coli transformation and plasmidDNA isolation steps. PAM sequences supporting Cas endonuclease cleavagein these linearized double-stranded DNA libraries would be captured anddeep sequenced as described in Examples 3, 4 and 8 for Spy, Sth3, Sth1and Blat Cas9 proteins. To identify those sequences that have truly beencleaved by a Cas endonuclease and not just the result of adaptorligation to the end of an un-cleaved oligoduplex, an in silicoenrichment step may be applied to the resulting deep sequencing reads byselecting for only those reads that contain an appropriate sequencejunction resulting for Cas endonuclease cleavage and adapter ligation.Once reads harboring a PAM sequence that supported cleavage have beenidentified, their nucleotide composition may be analyzed similar to thatdescribed for Spy, Sth3, Sth1 and Blat Cas9 proteins in Examples 3, 4and 8.

Example 14

Cas Endonuclease Proto-Spacer Adjacent Motifs (PAMs) May be AssayedDirectly in E. coli Cell Lysate

Cas endonuclease protein produced in E. coli may be directly (withoutsubsequent purification steps) used to assay proto-spacer adjacent motif(PAM) recognition and single guide RNA (sgRNA) requirements upon celllysis.

Streptococcus thermophilus CRISPR1 (Sth1) and Streptococcus thermophilusCRISPR3 (Sth3) Cas9 protein was produced in E. coli cells as describedin Example 2 but without the purification steps. In brief, aftercultures were grown, induced and allowed to express Cas9 protein, celllysis was performed via sonication and cell debri was pelleted bycentrifugation resulting in a cell lysate containing soluble Cas9protein. Cas9-guide RNA complexes were assembled by combining 20 μl ofresulting cell lysate with RiboLock RNase Inhibitor (40 U; Thermo FisherScientific) and 2 μg of T7 in vitro transcribed sgRNA (generated asdescribed in Example 7) and incubated at room temperature for 15 min. Toexamine PAM preferences at different Cas9 concentrations, 1 μg of the 7bp randomized PAM library (Example 4) was incubated with 10 μl ofvarious dilutions (1-fold (undiluted), 10-fold and 100-fold) of celllysate containing assembled Cas9 complexes in a 100 μl reaction buffer(10 mM Tris-HCl pH 7.5 at 37° C., 100 mM NaCl, 10 mM MgCl₂, 1 mM DTT) sothat E. coli lysate was diluted to a final concentration of either10-fold, 100-fold or 1000-fold, respectively. Reactions mixtures wereincubated for 60 min. at 37°, DNA end repaired with 2.5 U T4 DNApolymerase (Thermo Fisher Scientific), RNA digested with 1 μl RNase A/T1Mix (Thermo Fisher Scientific) and 3′ dA added with 2.5 U of DreamTaqDNA Polymerase (Thermo Fisher Scientific). Finally, DNA was recoveredusing a GeneJET PCR Purification Kit (Thermo Fisher Scientific). DNAfragments resulting from cleavage by Cas9 were tagged with adapters,captured and prepared for IIlumina deep sequencing as described inExample 3 (FIG. 3). The resulting libraries were deep sequenced asdescribed in Example 1. PAM sequences were identified from the resultingsequence data as described in Example 3 by only selecting those readscontaining a perfect 12 nt sequence match flanking either side of the 7nt PAM sequence capturing only those PAM sequences resulting fromperfect Cas9-guide RNA target site recognition, cleavage and adapterligation. The collection of resulting PAM sequences were then collapsedinto like sequences, counted, and frequency of each PAM supportingcleavage calculated. To compensate for inherent bias in the initialrandomized PAM libraries, the frequency of each PAM sequence was nextnormalized to its frequency in the starting library. Next, a PAMconsensus was calculated using a position frequency matrix (PFM). Thiswas accomplished by first aligning the collapsed PAM sequences. Then,each nucleotide (G, C, A, or T) at each position of the PAM was weightedbased on the frequency of the PAM sequence with which it was associated.Finally, the total contribution of each nucleotide (G, C, A, or T) ateach PAM position was summed to generate the overall probability ofidentifying a given nucleotide at each PAM position within the dataset.

Tables 13-18 represent the position frequency matrix (PFM) and resultingPAM consensus at each position of the 7 bp randomized PAM library forthe Streptococcus thermophilus CRISPR1 (Sth1) and Streptococcusthermophilus CRISPR3 (Sth3) Cas9 proteins when assayed at differentconcentrations of E. coli cell lysate. The nucleotide positions of the 7bp randomized PAM library are indicated by 1, 2, 3, 4, 5, 6, and 7 in a5′ to 3′ direction with 1 being the closest to the DNA sequence involvedin spacer target site recognition. The frequency of each nucleotide (G,C, A, T) at a respective position is indicated as a %. The consensus PAMpreference is listed at the bottom of the table (consensus). The numbersmarked with an asterisk (*) indicate the nucleotide preference(s) ateach position of the protospacer adjacent motif (PAM). The percentagesin the position frequency matrix (PFM) tables represent the probabilityof finding the corresponding nucleotide at each position of the PAMsequence and can be used to infer the strength of PAM recognition ateach position.

TABLE 13 Position frequency matrix (PFM) and PAM consensus forStreptococcus thermophilus CRISPR1 Cas9 with Cas9 protein provided via10 fold dilution of E. coli cell lysate. 1 2 3 4 5 6 7 G 17.69% 14.97%22.16% 41.47%* 9.34% 8.56% 21.79% C 27.64% 29.63% 5.67% 17.96% 28.97%10.45% 13.89% A 26.54% 25.79% 70.38%* 16.85% 55.56%* 64.09%* 26.22%* T28.13% 29.61% 1.79% 23.72% 6.13% 16.90% 38.10%* Consensus N N A G A A W

TABLE 14 Position frequency matrix (PFM) and PAM consensus forStreptococcus thermophilus CRISPR1 Cas9 with Cas9 protein provided via100 fold dilution of E. coli cell lysate. 1 2 3 4 5 6 7 G 19.80% 16.70%27.37% 43.52%* 11.01% 7.87% 20.20% C 25.74% 27.47% 6.01% 16.02% 24.04%8.77% 12.49% A 29.40% 25.80% 64.19%* 18.73% 59.60%* 69.09%* 27.66%* T25.06% 30.03% 2.43% 21.73% 5.36% 14.27% 39.65%* Consensus N N A G A A W

TABLE 15 Position frequency matrix (PFM) and PAM consensus forStreptococcus thermophilus CRISPR1 Cas9 with Cas9 protein provided via1000 fold dilution of E. coli cell lysate. 1 2 3 4 5 6 7 G 19.72% 16.25%24.92% 53.70%* 10.39% 3.79% 18.40% C 26.89% 30.09% 4.08% 13.55% 22.65%3.32% 10.18% A 27.92% 26.35% 70.37%* 15.20% 64.60%* 86.15%* 33.19%* T25.46% 27.30% 0.64% 17.55% 2.37% 6.73% 38.23%* Consensus N N A G A A W

TABLE 16 Position frequency matrix (PFM) and PAM consensus forStreptococcus thermophilus CRISPR3 Cas9 with Cas9 protein provided via10 fold dilution of E. coli cell lysate. 1 2 3 4 5 6 7 G 12.46% 49.67%*80.76%* 21.03% 49.94%* 23.46% 21.96% C 26.60% 9.72% 5.67% 15.73% 10.22%20.97% 24.97% A 16.71% 22.42% 8.85% 35.35% 19.75% 27.10% 25.69% T 44.23%18.18% 4.72% 27.89% 20.10% 28.46% 27.39% Consensus N G G N G N N

TABLE 17 Position frequency matrix (PFM) and PAM consensus forStreptococcus thermophilus CRISPR3 Cas9 with Cas9 protein provided via100 fold dilution of E. coli cell lysate. 1 2 3 4 5 6 7 G 12.06% 55.16%*82.16%* 23.38% 53.61%* 23.02% 22.39% C 28.81% 11.09% 5.10% 17.36% 10.19%21.26% 24.06% A 22.84% 17.33% 9.02% 31.55% 18.80% 25.87% 25.64% T 36.28%16.42% 3.72% 27.71% 17.40% 29.84% 27.91% Consensus N G G N G N N

TABLE 18 Position frequency matrix (PFM) and PAM consensus forStreptococcus thermophilus CRISPR3 Cas9 with Cas9 protein provided via1000 fold dilution of E. coli cell lysate. 1 2 3 4 5 6 7 G 12.26%63.66%* 89.19%* 27.07% 54.77%* 26.19% 23.09% C 30.31% 7.86% 2.78% 17.23%9.70% 19.39% 22.85% A 21.26% 15.31% 6.18% 29.16% 17.45% 26.56% 26.21% T36.17% 13.17% 1.86% 26.55% 18.08% 27.87% 27.86% Consensus N G G N G N N

As shown in Tables 13-18, all lysate dilutions yielded the canonical PAMpreferences for Sth1 and Sth3 Cas9 proteins, NNAGAAW and NGGNG,respectively. Similar to the results with purified protein in Examples3, 4 and 8, higher concentrations of lysate and consequentially Cas9protein resulted in a relaxation of PAM specificity. This was mostnotable for the Sth3 Cas9 protein at PAM position 2 where the preferencefor a G residue is reduced from approximately 64% in the PFM in the1000-fold dilution (final concentration) reaction to around 50% in the10-fold dilution (final concentration) experiment (Tables 16-18). ForSth1 Cas9 protein, PAM positions 4, 5 and 6 were most particularlyaffected by different concentrations of Cas9 protein in the lysatedilution experiments.

This data indicates that the in vitro PAM library assay described hereinobtained the same results for the PAM preferences for Sth1 and Sth3 Cas9proteins when compared to assays where the Sth1 and Sth3 Cas9 proteinsare stably expressed (in-vivo expressed). Hence, the in vitro PAMlibrary assay described herein, or derivations of it, may be used toassay PAM specificity from any Cas endonuclease using unpurified Casprotein coming directly from E. coli lysate. Additionally by diluting E.coli lysate containing Cas9 protein, the in vitro PAM library assaypermits the measurement of PAM specificity to be examined as a functionof Cas endonuclease concentration as is evident by the apparentbroadening in PAM specificity as E. coli lysate containing Cas9 proteinwas increased.

Example 15 Cas Endonuclease Proto-Spacer Adjacent Motifs (PAMs) May beAssayed Directly with In Vitro Translated Protein

Cas endonuclease protein produced by in vitro translation may be used todirectly (without subsequent purification steps) assay proto-spaceradjacent motifs (PAM) and single guide RNA (sgRNA) requirements.

The Streptococcus pyogenes (Spy) cas9 gene was codon optimized forexpression in eukaryotes (maize) with standard methods known in the artand operably linked to the in vitro translation (IVT) vectorpT7CFE1-NHIS-GST-CHA (Thermo Fisher Scientific). To eliminate expressionof the HA tag, a stop codon was included between the Spy cas9 gene andC-terminal tag. The resulting plasmid was purified by phenol:chloroformextraction to remove residual RNases and further purified byprecipitation with 2 volumes of ethanol in the presence of sodiumacetate. Next, Spy protein was produced in vitro using a 1-Step HumanCoupled IVT Kit (Thermo Fisher Scientific) per the manufacturer'sinstruction allowing the reaction to proceed overnight at 30° C.Following the incubation, the reactions were centrifuged at 10,000 rpmfor 5 min. 20 μl of supernatant containing soluble Cas9 protein wasmixed with 2 μg of T7 in vitro transcribed sgRNA (generated as describedin Example 7) and incubated for 15 min. at room temperature. To examinePAM preferences at different Cas9 concentrations, 1 μg of the 7 bprandomized PAM library (Example 4) was incubated with 10 μl of variousdilutions (1-fold (undiluted), 10-fold and 100-fold) of in vitrotranslation mixtures containing assembled Cas9 complexes in a 100 μlreaction buffer (10 mM Tris-HCl pH 7.5 at 37° C., 100 mM NaCl, 10 mMMgCl₂, 1 mM DTT) so that IVT supernatant was diluted to a finalconcentration of either 10-fold, 100-fold or 1000-fold. Reactionsmixtures were incubated for 60 min at 37°, DNA end repaired with 2.5 UT4 DNA polymerase (Thermo Fisher Scientific), RNA digested with 1 μlRNase A/T1 Mix (Thermo Fisher Scientific) and 3′ dA added with 2.5 U ofDreamTaq DNA Polymerase (Thermo Fisher Scientific). Finally, DNA wasrecovered using a GeneJET PCR Purification Kit (Thermo FisherScientific). PAM sequences supporting cleavage were captured by adapterligation and enriched for as described in Example 3 (FIG. 3). Theresulting libraries were deep sequenced as described in Example 1. PAMsequences were identified from the resulting sequence data as describedin Example 3 by only selecting those reads containing a perfect 12 ntsequence match flanking either side of the 5 or 7 nt PAM sequencecapturing only those PAM sequences resulting from perfect Cas9-guide RNAtarget site recognition, cleavage and adapter ligation. To compensatefor inherent bias in the initial randomized PAM library, the frequencyof each PAM sequence was normalized to its frequency in the startinglibrary and a PAM consensus was then calculated with a positionfrequency matrix (PFM) as described in Example 14.

Tables 19-21 represent the position frequency matrix (PFM) and resultingPAM consensus at each position of the 7 bp randomized PAM library forthe Streptococcus pyogenes Cas9 protein when assayed at differentconcentrations of in vitro translated (IVT) supernatant. The nucleotidepositions of the 7 bp randomized PAM library are indicated by 1, 2, 3,4, 5, 6, and 7 in a 5′ to 3′ direction with 1 being the closest to theDNA sequence involved in spacer target site recognition. The frequencyof each nucleotide (G, C, A, T) at a respective position is indicated asa %. The consensus PAM preference is listed at the bottom of the table(consensus). The numbers marked with an asterisk (*) indicate thenucleotide preference(s) at each position of the protospacer adjacentmotif (PAM). The percentages in the position frequency matrix (PFM)tables represent the probability of finding the corresponding nucleotideat each position of the PAM sequence and can be used to infer thestrength of PAM recognition at each position.

TABLE 19 Position frequency matrix (PFM) and PAM consensus forStreptococcus pyogenes Cas9 with Cas9 protein provided via 10 folddilution of in-vitro translated solution (IVT). 1 2 3 4 5 6 7 G 24.18%53.04%* 72.63%* 19.30% 14.19% 19.97% 23.65% C 25.97% 7.16% 8.52% 24.26%25.67% 28.52% 27.44% A 25.21% 28.71% 14.69% 22.57% 23.80% 19.66% 20.39%T 24.64% 11.09% 4.16% 33.87% 36.34% 31.85% 28.52% Consensus N G G N N NN

TABLE 20 Position frequency matrix (PFM) and PAM consensus forStreptococcus pyogenes Cas9 with Cas9 protein provided via 100 folddilution of in-vitro translated solution (IVT). 1 2 3 4 5 6 7 G 23.84%52.07%* 78.60%* 21.17% 14.72% 19.66% 22.39% C 24.16% 6.26% 4.34% 21.69%23.72% 28.60% 27.09% A 26.64% 34.55% 14.85% 25.33% 25.90% 20.48% 21.29%T 25.36% 7.12% 2.21% 31.82% 35.66% 31.26% 29.23% Consensus N G G N N N N

TABLE 21 Position frequency matrix (PFM) and PAM consensus forStreptococcus pyogenes Cas9 with Cas9 protein provided via 1000 folddilution of in-vitro translated solution (IVT). 1 2 3 4 5 6 7 G 23.39%81.14%* 95.35%* 27.51% 15.79% 19.98% 22.92% C 22.34% 2.54% 0.80% 14.69%23.08% 26.85% 25.30% A 29.08% 12.52% 3.07% 26.65% 25.51% 22.87% 22.57% T25.19% 3.80% 0.78% 31.15% 35.63% 30.29% 29.22% Consensus N G G N N N N

As illustrated in Tables 19-21, the PAM requirement preferences reportedfor the Spy Cas9 protein (NGG) may be recapitulated under all IVTdilutions. Similar to the results with purified protein in Examples 3, 4and 8, higher concentrations of IVT supernatant and consequentially Cas9protein resulted in a broadening of PAM specificity. This was mostnotable for Spy Cas9 at PAM position 2 where the frequency for anuncanonical A residue increases from approximately 13% in the PFM withthe 1000-fold dilution (final concentration) reaction to around 29% inthe 10-fold dilution (final concentration) experiment.

This data indicates that the in vitro translation (IVT) assay describedherein obtained the same results for the PAM preferences for Spy Cas9protein when compared to assays where the Spy Cas9 protein is stablyexpressed (in-vivo expressed). Hence, the in vitro translation (IVT)assay described herein, or derivations of it, may be used to assay PAMspecificity from any Cas endonuclease. Additionally by diluting IVTproducts containing Cas9 protein, our assay permits the measurement ofPAM specificity to be examined as a function of Cas endonucleaseconcentration as evident by the apparent broadening in PAM specificityas IVT supernatant containing Cas9 protein was increased.

Example 16 Guide RNA and PAM Requirements for Novel Cas Endonucleases

The single guide RNA (sgRNA) and PAM requirements of the Cas9endonucleases from Lactobacillus reuteri MIc3 (Lreu), Lactobacillusnodensis JCM 14932 (Lnod), Sulfurospirillum sp. SCADC (Sspe),Bifidobacterium thermophilum DSM 20210 (Bthe), Loktanella vestfoldensis(Lves), Epilithonimonas tenax DSM 16811 (Eten) and Sporocytophagamyxococcoides (Smyx) (Example 9) were determined with the methodsdescribed herein.

If purified protein could not be easily obtained as described in Example2, Cas9 protein from E. coli cell lysate as described in Example 14 orin vitro translated (IVT) Cas9 protein as described in Example 15 wasutilized. Once a source of Cas9 protein was established, 1 μg of the 7bp randomized PAM plasmid DNA library (Example 4) was subject toCas9-guide RNA digestion at various concentrations of either purifiedprotein, lysate, or IVT protein. DNA fragments resulting from cleavageby Cas9 were ligated to adapters, captured and prepared for IIluminadeep sequencing as described in Example 3 (FIG. 3). The resultinglibraries were deep sequenced as described in Example 1. Since theposition of cleavage within target sites for novel Cas9 proteins isunknown, reads were 1^(st) examined for the most predominant cleavagelocation by examining the junction resulting from cleavage and adapterligation. After properly defining the position of cleavage, PAMsequences were identified from the resulting sequence data as describedin Example 3 by only selecting those reads containing a perfect 12 ntsequence match flanking either side of the 5 or 7 nt PAM sequence. Tocompensate for inherent bias in the initial randomized PAM library, thefrequency of each PAM sequence was normalized to its frequency in thestarting library and a PAM consensus was then calculated with a positionfrequency matrix (PFM) as described in Example 14. To obtain the mostaccurate read-out on PAM specificity and avoid conditions that areconducive to promiscuous PAM recognition (Examples 3, 4, 8, 14 and 15),the lowest concentration of Cas9 (purified, E. coli lysate or IVTsupernatant) that supported cleavage was used to ascertain the PAMrecognition of each Cas9 protein.

Tables 22-28 represent the position frequency matrix (PFM) and resultingPAM consensus at each position of the 7 bp randomized PAM library forseveral previously uncharacterized Cas9 proteins. Results derived fromthe lowest concentration of Cas9 coming from either purified, E. colilysate or in vitro translation (IVT) supernatant that supported cleavageare shown. The nucleotide positions of the 7 bp randomized PAM libraryare indicated by 1, 2, 3, 4, 5, 6, and 7 in a 5′ to 3′ direction with 1being the closest to the DNA sequence involved in spacer target siterecognition. The frequency of each nucleotide (G, C, A, T) at arespective position is indicated as a %. The consensus PAM preference islisted at the bottom of the table (consensus). The numbers marked withan asterisk (*) indicate the nucleotide preference(s) at each positionof the protospacer adjacent motif (PAM). The percentages in the positionfrequency matrix (PFM) tables represent the probability of finding thecorresponding nucleotide at each position of the PAM sequence and can beused to infer the strength of PAM recognition at each position.

TABLE 22 Position frequency matrix (PFM) and PAM consensus forLactobacillus reuteri Cas9 when purified Cas9 protein was used (0.5 nMCas9-guide RNA complex and 60 minute digestion time). 1 2 3 4 5 6 7 G15.57% 83.27%* 98.90%* 31.64% 39.04%* 25.51% 15.86% C 15.96% 2.44% 0.12%17.94% 24.13% 26.77% 34.32% A 17.74% 11.81% 0.66% 14.84% 11.30% 22.13%18.37% T 50.73% 2.48% 0.32% 35.58% 25.53% 25.58% 31.44% Consensus N (T >V) G G N N (G > H) N N

TABLE 23 Position frequency matrix (PFM) and PAM consensus forLactobacillus nodensis Cas9 when purified Cas9 protein was used (50 nMCas9-guide RNA complex and 60 minute digestion time). 1 2 3 4 5 6 7 G21.47% 13.95% 2.62% 7.92% 4.07% 5.67% 24.14% C 25.74% 23.76% 2.07% 1.53%1.68% 1.29% 16.67% A 22.41% 19.73% 94.31%* 89.34%* 93.77%* 91.48%*33.13% T 30.38% 42.56%* 0.99% 1.22% 0.48% 1.55% 26.07% Consensus N N(T > V) A A A A N

TABLE 24 Position frequency matrix (PFM) and PAM consensus forSulfurospirillum sp. SCADC Cas9 with Cas9 protein provided via 1000 folddilution of in vitro translated solution (IVT). 1 2 3 4 5 6 7 G 16.26%97.32%* 97.67%* 18.52% 22.18% 18.86% 23.20% C 24.43% 0.95% 0.85% 20.37%20.90% 25.19% 22.14% A 35.19% 1.11% 0.74% 31.97% 22.61% 26.12% 23.94% T24.13% 0.61% 0.74% 29.13% 34.31% 29.82% 30.72% Consensus N G G N N N N

TABLE 25 Position frequency matrix (PFM) and PAM consensus forBifidobacterium thermophilum Cas9 when purified Cas9 protein was used(0.5 nM Cas9-guide RNA complex and 60 minute digestion time). 1 2 3 4 56 7 G 18.93% 16.16% 20.28% 0.10% 0.03% 2.53% 3.19% C 34.69% 31.11%27.80% 99.55%* 99.05%* 5.34% 47.56%* A 23.13% 28.52% 28.76% 0.13% 0.40%91.44%* 1.17% T 23.24% 24.20% 23.17% 0.21% 0.52% 0.69% 48.08%* ConsensusN N N C C A Y

TABLE 26 Position frequency matrix (PFM) and PAM consensus forLoktanella vestfoldensis Cas9 with Cas9 protein provided via 1000 folddilution of E. coli cell lysate. 1 2 3 4 5 6 7 G 21.74% 62.30%* 51.21%*13.71% 17.79% 32.03%* 23.70% C 29.99% 8.00% 5.94% 10.17% 5.73% 14.72%23.82% A 16.37% 21.66% 37.33%* 63.65%* 64.49%* 13.01% 25.97% T 31.89%8.03% 5.51% 12.47% 11.99% 40.24%* 26.51% Consensus N G R (G > A) A A K N

TABLE 27 Position frequency matrix (PFM) and PAM consensus forEpilithonimonas tenax Cas9 with Cas9 protein provided via 10 folddilution of E. coli cell lysate. 1 2 3 4 5 6 7 G 30.87%* 25.83% 39.60%*18.03% 14.19% 87.26%* 91.40%* C 30.34%* 7.27% 3.14% 7.13% 11.68% 2.31%2.27% A 15.84% 63.47%* 54.64%* 71.53%* 31.83%* 3.18% 2.88% T 22.95%3.43% 2.61% 3.30% 42.29%* 7.25% 3.45% Consensus N (S > W) A R A N (W >S) G G

TABLE 28 Position frequency matrix (PFM) and PAM consensus forSporocytophaga myxococcoides Cas9 when purified Cas9 protein was used(50 nM Cas9-guide RNA complex and 60 minute digestion time). 1 2 3 4 5 67 G 10.48% 19.15% 2.54% 4.72% 1.02% 7.00% 23.48% C 26.01% 14.45% 0.56%23.74% 0.80% 3.23% 17.02% A 19.94% 59.05%* 96.61%* 7.74% 97.97%* 79.56%*28.21% T 43.58% 7.35% 0.29% 63.80%* 0.21% 10.21% 31.28% Consensus N (T >V) A A T A A N

TABLE 29 Summary of sgRNA and PAM requirement for novel Casendonucleases. PAM sgRNA Bacterial Origin Abbreviation consensus SEQ IDNO: Lactobacillus reuteri Mlc3 Lreu Table 22 114 Lactobacillus nodensisJCM 14932 Lnod Table 23 117 Sulfurospirillum sp. SCADC Sspe Table 24 119Bifidobacterium thermophilum DSM 20210 Bthe Table 25 120 Loktanellavestfoldensis Lves Table 26 121 Epilithonimonas tenax DSM 16811 EtenTable 27 123 Sporocytophaga myxococcoides Smyx Table 28 124

Among the Cas9 proteins examined, both the length and composition of PAMrecognition was diverse. Two of the Cas9 proteins, Lreu and Sspe (Tables22-23), exhibited PAM recognition similar to the Streptococcus pyogenes(Spy) Cas9 protein which predominantly recognizes a NGG PAM while othersexhibited very C-rich (Bthe, Table 25) or A-rich (Lnod and Smyx; Tables23 and 28) PAM recognition. Additionally, a couple of the Cas9 proteins,Eten and Lves (Tables 26 and 27), yielded characteristics of both G-richand A-rich PAM recognition.

Unlike the diversity observed for PAM recognition, the position oftarget site cleavage did not differ greatly and was determined to bebetween the 3^(rd) and 4th bp upstream (5 prime) of the PAM for all Cas9proteins except for one, the Cas9 protein from Sulfurospirillum sp.SCADC. Interestingly, the predominant cleavage location by examining thejunction resulting from cleavage and adapter ligation was around the7^(th) bp upstream (5 prime) of the PAM sequence.

Taken together, these data further suggest that the methods describedherein can be used to characterize novel Cas endonuclease PAM and guideRNA requirements.

Example 17 In Planta Genome Editing with Novel Cas9 Endonucleases

After determining the proto-spacer adjacent motif (PAM) and guide RNArequirement as described herein, Cas9 proteins with novel PAMrecognition were selected and tested for their ability to cleave andmutagenize maize chromosomal DNA as described in Example 12.

To expand the number and diversity of sites available for genomeediting, Cas9 proteins with diverse PAM recognition were selected forevaluation in corn by preferentially choosing systems with either A, Tor C-rich PAM recognition to best complement the G-rich PAM of theStreptococcus pyogenes (Spy) Cas9 protein. Once systems were selected,DNA target sites adjacent to the appropriate PAM sequence were chosenand maize optimized cas9 gene and single guide RNA (sgRNA) expressionvectors were constructed and delivered into maize immature embryos asdescribed in Example 12. Embryos were harvested two days aftertransformation and chromosomal DNA was analyzed for the presence ofmutations resulting from DNA target site cleavage and repair asdescribed in Example 12. The frequency of mutations identified at eachtarget site for each Cas9 is listed in Table 30.

Interestingly, the Bifidobacterium thermophilum (Bthe) Cas9 proteinfailed to effectively mutagenize its target sites. However whendifferent spacer lengths were tested for Bthe, the frequency ofmutagenesis improved dramatically with a spacer length around 25 ntbeing the most optimal (FIG. 36). Since the minimal spacer length forthe Streptococcus pyogenes (Spy) Cas9 sgRNA is approximately 17 nt inlength, it seems that the sgRNA spacer DNA target interactions for BtheCas9 may provide enhanced specificity relative to the Spy Cas9 protein.

TABLE 30 Maize chromosomal target DNA mutation frequencies two daysafter transformation by particle gun. sgRNA DNA Target Spacer MutationOrigin of cas9 gene Location Length Frequency Bifidobacteriumthermophilum Chr1: 51.81 cM 25 0.29% DSM 20210 Chr9: 119.15 cM 25 0.05%Lactobacillus nodensis JCM Chr1: 51.81 cM 21 0.06% 14932 Chr9: 119.15 cM22 0.28%

Taken together, these results indicate that the methods described hereinto characterize Cas endonuclease PAM recognition and guide RNArequirements are robust. Ultimately, allowing new Cas endonucleasesystems to be characterized for genome editing applications.

That which is claimed:
 1. A single guide RNA capable of forming a guideRNA/Cas9 endonuclease complex, wherein said guide RNA/Cas9 endonucleasecomplex can recognize, bind to, and optionally nick or cleave a targetsequence, wherein said single guide RNA is selected from the groupconsisting of SEQ ID NOs: 128, 129, 130, 131, 132, 133, 134, 135, 136,137, 138 and
 139. 2. A single guide RNA capable of forming a guideRNA/Cas9 endonuclease complex, wherein said guide RNA/Cas9 endonucleasecomplex can recognize, bind to, and optionally nick or cleave a targetsequence, wherein said single guide RNA comprises a chimericnon-naturally occurring crRNA linked to a tracrRNA, wherein saidtracrRNA comprises a nucleotide sequence selected from the groupconsisting of SEQ ID NOs: 173, 174, 175, 176, 177, 178, 179, 180, 181,182, 183 and
 184. 3. A single guide RNA capable of forming a guideRNA/Cas9 endonuclease complex, wherein said guide RNA/Cas9 endonucleasecomplex can recognize, bind to, and optionally nick or cleave a targetsequence, wherein said single guide RNA comprises a chimericnon-naturally occurring crRNA linked to a tracrRNA, wherein saidchimeric non-naturally occurring crRNA comprises a nucleotide sequenceselected from the group consisting of SEQ ID NOs: 149, 150, 151, 152,153, 154, 155, 156, 157, 158, 159 and
 160. 4. A guide RNA capable offorming a guide RNA/Cas9 endonuclease complex, wherein said guideRNA/Cas9 endonuclease complex can recognize, bind to, and optionallynick or cleave a target sequence, wherein said guide RNA is a duplexmolecule comprising a chimeric non-naturally occurring crRNA and atracrRNA, wherein said chimeric non-naturally occurring crRNA comprisesa variable targeting domain capable of hybridizing to said targetsequence, wherein said tracrRNA comprises a nucleotide sequence selectedfrom the group consisting of SEQ ID NOs: 173, 174, 175, 176, 177, 178,179, 180, 181, 182, 183 and 184, wherein said chimeric non-naturallyoccurring crRNA comprises a variable targeting domain capable ofhybridizing to said target sequence.
 5. A guide RNA capable of forming aguide RNA/Cas9 endonuclease complex, wherein said guide RNA/Cas9endonuclease complex can recognize, bind to, and optionally nick orcleave a target sequence, wherein said guide RNA is a duplex moleculecomprising a chimeric non-naturally occurring crRNA and a tracrRNA,wherein said chimeric non-naturally occurring crRNA comprises anucleotide sequence selected from the group consisting of SEQ ID NOs:149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159 and 160, whereinsaid chimeric non-naturally occurring crRNA comprises a variabletargeting domain capable of hybridizing to said target sequence.
 6. Aguide RNA capable of forming a guide RNA/Cas9 endonuclease complex,wherein said guide RNA/Cas9 endonuclease complex can recognize, bind to,and optionally nick or cleave a target sequence, wherein said guide RNAis a duplex molecule comprising a chimeric non-naturally occurring crRNAand a tracrRNA, wherein said tracrRNA comprises a nucleotide sequenceselected from the group consisting of SEQ ID NOs: 173, 174, 175, 176,177, 178, 179, 180, 181, 182, 183 and 184, wherein said chimericnon-naturally occurring crRNA comprises a nucleotide sequence selectedfrom the group consisting of SEQ ID NOs: 149, 150, 151, 152, 153, 154,155, 156, 157, 158, 159 and 160, wherein said chimeric non-naturallyoccurring crRNA comprises a variable targeting domain capable ofhybridizing to said target sequence.
 7. A guide RNA/Cas9 endonucleasecomplex comprising a Cas9 endonuclease selected from the groupconsisting of SEQ ID NOs: 81, 82, 83, 84, 85, 86, 87, 88, 89, 90 and 91,or a functional fragment thereof, and at least one guide RNA, whereinsaid guide RNA/Cas9 endonuclease complex is capable of recognizing,binding to, and optionally nicking or cleaving all or part of a targetsequence.
 8. A guide RNA/Cas9 endonuclease complex comprising at leastone guide RNA and a Cas9 endonuclease, wherein said Cas9 endonuclease isencoded by a DNA sequence selected from the group consisting of SEQ IDNOs: 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, and 80, wherein said guideRNA/Cas9 endonuclease complex is capable of recognizing, binding to, andoptionally nicking or cleaving all or part of a target sequence.
 9. Theguide RNA/Cas9 endonuclease complex of claim 7, wherein said guide RNAis selected from the group consisting of SEQ ID NOs: 128, 129, 130, 131,132, 133, 134, 135, 136, 137, 138 and
 139. 10. The guide RNA/Cas9endonuclease complex of claim 7, wherein said target sequence is locatedin the genome of a cell.
 11. A method for modifying a target site in thegenome of a cell, the method comprising providing to said cell at leastone Cas9 endonuclease selected from the group consisting of SEQ ID NOs:81, 82, 83, 84, 85, 86, 87, 88, 89, 90 and 91, or a functional fragmentthereof, and at least one guide RNA, wherein said guide RNA and Cas9endonuclease can form a complex that is capable of recognizing, bindingto, and optionally nicking or cleaving all or part of said target site.12. The method of claim 10, further comprising identifying at least onecell that has a modification at said target, wherein the modification atsaid target site is selected from the group consisting of (i) areplacement of at least one nucleotide, (ii) a deletion of at least onenucleotide, (iii) an insertion of at least one nucleotide, and (iv) anycombination of (i)-(iii).
 13. A method for editing a nucleotide sequencein the genome of a cell, the method comprising providing to said cell atleast one Cas9 endonuclease selected from the group consisting of SEQ IDNOs: 81, 82, 83, 84, 85, 86, 87, 88, 89, 90 and 91, or a functionalfragment thereof, a polynucleotide modification template, and at leastone guide RNA, wherein said polynucleotide modification templatecomprises at least one nucleotide modification of said nucleotidesequence, wherein said guide RNA and Cas9 endonuclease can form acomplex that is capable of recognizing, binding to, and optionallynicking or cleaving all or part of said target site.
 14. A method formodifying a target site in the genome of a cell, the method comprisingproviding to said cell at least one guide RNA, at least one donor DNA,and at least one Cas9 endonuclease selected from the group consisting ofSEQ ID NOs: 81, 82, 83, 84, 85, 86, 87, 88, 89, 90 and 91, or afunctional fragment thereof, wherein said at least one guide RNA and atleast one Cas9 endonuclease can form a complex that is capable ofrecognizing, binding to, and optionally nicking or cleaving all or partof said target site, wherein said donor DNA comprises a polynucleotideof interest.
 15. The method of claim 11, 13 or 14, wherein said guideRNA is selected from the group consisting of SEQ ID NOs: 128, 129, 130,131, 132, 133, 134, 135, 136, 137, 138 and
 139. 16. The method of claim13, further comprising identifying at least one cell that saidpolynucleotide of interest integrated in or near said target site. 17.The method of any one of claims 10-14, wherein the cell is selected fromthe group consisting of a human, non-human, animal, bacterial, fungal,insect, yeast, non-conventional yeast, and plant cell.